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A book of this sort needs little in the way of formal introduction. Few people need to 
be told, in the broadest terms, what a dictionary is, or what lexicographers do—and 
answering both of those questions in less broad terms, from a variety of different view- 
points, is the business of most of the rest of this book. The thirty-seven contributions 
in this volume together constitute a guide to what are, in the editor's view, the most 
significant contours in the geography of the lexicographical world, as well as offering a 
series of eye-witness accounts of the major issues confronting both lexicographers and 
the users of dictionaries today. Nonetheless, it is likely that any reader who has come to 
this handbook relatively new to the world of lexicography will appreciate a few initial 
words of orientation, on what this book does and does not set out to cover, and why it 
has been structured as it has—and it is likely also that the same questions will be of 
interest to those readers who are far from neophytes in this field. 

The structural divisions of the first two parts of this handbook reflect the major dis- 
tinctive types of lexicography found today. Part I considers the synchronic dictionary, 
and especially three characteristic types of synchronic dictionary: 


(1) the dictionary for a general readership (many people’s conception of ‘the 
dictionary’); 

(2) the monolingual dictionary intended for L2 learners; 

(3) the bilingual dictionary. 


These are placed together so that, after their distinctive features have been set out, the 
many common areas of methodology can be considered together, in particular the cen- 
tral role of the corpus in lexicography of this type today. Such dictionaries normally 
share some other features as well: they are all usually synchronic, that is to say, they 
deal with a single chronological period of language use, almost always the period in 
which they are produced; additionally, they are typically intended for a mass market, 
and produced by commercial publishers. 
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‘The historical dictionary is treated separately in Part II, to reflect its very different 
methodology and approach. Historical dictionaries are typically edited within an aca- 
demic institution and published on its behalf by an academic press, and are typically 
intended for a more specialist readership (although there are exceptions—and indeed 
what is perhaps the world’s best-known historical dictionary, the Oxford English 
Dictionary, is not entirely typical in either of these respects). More fundamentally, the 
diachronic perspective of a historical dictionary produces structural and presenta- 
tional challenges that are quite different from those of synchronic dictionaries, so that 
one can identify methods and procedures that broadly distinguish ‘diachronic lexicog- 
raphy’ from ‘synchronic lexicography’. 

Although they have been picked out because of their importance and their influence, 
the four major types of dictionary discussed in Parts I and II come to much less than 
the total sum of lexicography. In Part II] a set of slightly shorter chapters examine some 
of the most important types of specialist dictionaries and their salient qualities and 
methodologies. Contributors have been invited to highlight what sets apart a particu- 
lar area of specialist lexicography in which they have a particular interest (as scholarly 
commentators or practitioners or both), but also how that area connects with the wider 
concerns and methodologies of the lexicographical world. I hope that what emerges 
above all from this section is how all dictionaries ultimately form part of an intercon- 
nected network of information about language and languages. (This last point explains 
the inclusion of what some purists may consider an interloper, in the form of a chapter 
on thesauruses.) 

Part IV examines some topics which are common to or have implications for vari- 
ous different types of lexicography, as well as some of the challenges and debates that 
cross the boundaries of various sub-fields. As such, the chapters in this section all take 
up and develop in more depth themes that have necessarily been touched on in a num- 
ber of the earlier chapters. The final two chapters take up themes that, in very differ- 
ent ways, typify the role and place of dictionaries in the wider world: in the case of 
Hilary Nesi’s contribution, how the changing digital environment is having very prac- 
tical effects on how dictionaries are both used and produced, while Stefan Dollinger 
examines some of the political, historical, and social factors that come into play in the 
editing and publication of dictionaries of national varieties of international languages. 

Some readers may feel that historical dictionaries, and dictionaries with a dia- 
chronic slant (such as those describing the origins of place-names or personal names, 
or etymological dictionaries), receive a surprising amount of attention in this book, 
particularly since many shorter overviews of lexicography squeeze such topics into 
a very small space indeed, or even omit them entirely. The intention here is to pro- 
vide the space for a much needed dialogue between two different parts of the dic- 
tionary world today: the synchronic dictionary, generally having a much wider 
reader base, and often subject to considerable (and perhaps simultaneously stimu- 
lating and constricting) commercial pressures; and the historical dictionary, rightly 
or wrongly regarded by many as the ‘prestige’ dictionary par excellence, but also 
associated by some with old-fashioned methods, academicism, stuffiness, or even a 
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certain hauteur. This book sets out to provide the space for each of these traditions 
to be explored in detail and to show at least some of the variety of approaches to be 
found in each tradition today; it also, of course, has the deliberate intention of high- 
lighting not just what distinguishes these two traditions, but also of drawing into 
focus the very many similarities in both the challenges and opportunities each faces 
in the contemporary environment. 

Nearly all of the contributions in this book are from people who have practical expe- 
rience of lexicographical work. A large number are from people who are lexicographers 
first and foremost, and who have been encouraged to draw on their own practical expe- 
rience in their contributions here. Accordingly, alongside the overviews that would be 
expected of any handbook, many of the chapters present longer or shorter case studies 
arising from the contributors’ practical work. I hope that these will serve to give the 
readers of this book a feeling for the practice of lexicography, as well as a direct insight 
into the kinds of decisions made in different parts of the lexicographical world on a 
daily basis. 

The main focus of this book is the world of lexicography today (whether synchronic 
or diachronic), not the history of lexicography, but in many important areas some 
awareness of history helps inform an understanding of the contemporary scene. The 
introductory chapters on major dictionary types by Béjoint, Heuberger, Fontenelle, 
and Considine do each discuss the history and development of the dictionary typein 
question (respectively dictionaries for general users, dictionaries for learners, bilin- 
gual dictionaries, and historical dictionaries), as well as current issues affecting each 
of these dictionary types. A specially commissioned annotated chronology of major 
events in the history of lexicography by John Considine at the end of this volume may 
be a useful aid to placing historical references elsewhere in the book in context. Had 
the primary aim been to concentrate not on the state of lexicography today but on 
how lexicography came to be as it is today, it is likely that the following major shifts 
in the tectonic plates of the lexicographical world would bulk large in the structure of 
this book: 


« Many members of the general public think not of ‘a dictionary’ but of ‘the dic- 
tionary’, by which is meant a general dictionary of a language, regarded as a 
source of authority on the meaning of words and on many other matters concern- 
ing language. Nevertheless, it was only very slowly that the general dictionary 
of a language emerged as a fundamental type, alongside the specialist lexicon of 
hard words, or of the technical vocabulary of a particular field, and alongside the 
bilingual dictionary. Here, justly or unjustly, Samuel Johnson's Dictionary of the 
English Language of 1755 has long been regarded as the most significant landmark 
publication in the history of general dictionaries of English. 

¢ The outstanding development of the nineteenth century was the historical dic- 
tionary, with an associated empirical methodology that has stood ever since as one 
of the cornerstones of historical linguistics, and which long exercised a deep influ- 
ence on the structure and design of most other dictionary types. 
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+ In the mid-twentieth century, learners’ dictionaries broke new ground in describ- 
ing typical patterns of usage, supported by examples of typical use, as well as by 
extensive coverage of questions of grammar and usage; they also drew new atten- 
tion to the type of language employed in definitions. 

¢ From the 1980s onwards, developments in corpus linguistics enabled lexicographers 
to go much further in identifying typical word behaviour and patterns of collocation, 
heralding a revolution in the compilation and design of all synchronic dictionaries. 

¢ From the late 1980s, and in earnest from the turn of the millennium, digital publi- 
cation (at first on CD-ROMs andlater on the web) and the availability of electronic 
text databases began to transform both the compilation and the publication of his- 
torical dictionaries, and began to enable users to pose and answer new questions 
using those dictionaries. 

« From the early 2000s, online publication has transformed access to all types of 
dictionaries, but traditional print sales of most dictionary types have been hit 
hard, and traditional dictionaries that have migrated from print to the digital 
world have frequently found themselves co-existing with new types of lexical 
resources, and often competing with them for readers. 


There is more that unites than divides all lexicography, and this book has been devised 
and structured with the underlying assumption that every chapter contributes some- 
thing to building an overview of the wide and varied world of lexicography today. All 
sections of this book would have been almost infinitely extendible, but it is perhaps in 
Part III that absences will be felt most keenly; in a project of this size, it is inevitable that 
some contributions will fall by the wayside, and I particularly regret the loss of chapters 
on dictionaries of sign language, and on the particular challenges faced by lexicogra- 
phers of endangered languages, of less-used languages, or of languages spoken in less 
well-resourced communities (for instance, many of the languages of Africa). Just as inev- 
itably, other readers will doubtless feel that other topics could or should have had agreater 
claim for inclusion than some that have been included; every editor has to make difficult 
choices, and the world of lexicography is so large and varied that even in a volume of this 
size a very selective approach has had to be taken. I would encourage all readers to make 
good use of the pointers to the wider literature on lexicography in every chapter in this 
volume, and to treat this handbookas an invitation to further exploration. 


1 This introduction has avoided singling out any among the many good introductions and 
surveys covering various specific aspects of lexicography that exist today; the best are all referred 
to at appropriate places in the individual chapters in this volume. However, an exception can and 
should be made for these topics singled out here as omitted from this volume: on dictionaries of sign 
language, see Schermer (2006), Zwitserlood (2010), and Zwitserlood et al. (2013); for some orientation 
in recent work on lexicography of endangered languages, less-used languages, and less well-resourced 
languages, see variously Haviland (2006), Ogilvie (2010, 2011), Popkema (2010), Mosel (2011), Prinsloo 
(2012), and Benjamin and Radetzky (2014a, 2014b). 
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2.1 WHAT Is A DICTIONARY 
FOR GENERAL USERS? 


EveryBopy knows what a dictionary for general users (DGU) is, yet the notion is dif- 
ficult to define! 

Saying that it is a dictionary for the general public and for general use, giving general 
information, or information on general language, does not help much. We know that 
it is not bilingual or multilingual and that it is not in many volumes, nor very expen- 
sive; it is not thematically ordered, its wordlist is not limited to a particular category of 
words, it is not exclusively encyclopaedic, and it is not meant for a special category of 
users: children, students, or scholars. It is the one-volume dictionary that sits on the 
shelves of almost every family, one of our biggest books. It is the dictionary that we 
inherited from our parents, that we open from time to time to check the spelling of a 
word or to play Scrabble’ with friends, the dictionary that we buy for our children when 
they become teenagers. It is the book that we consult when we want to know whether a 
word belongs to the language, the book that has the last word on questions of usage. It is 
the dictionary, the prototype of the genre. 


" Many thanks to Philip Durkin, who was supportive at all stages of the writing of this chapter, to 
John Considine, who helped me reduce the text to a reasonable size, to Valerie Grundy who helped 
me improve the first version, and to Jeremy Harrison, who checked my English. Any remaining errors 
are mine. 
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The dictionary for all the family is a modern invention. In a history of lexicography 
that began approximately 5,000 years ago, it only appeared in the last 200. It is one of 
the late products of this history, although its features appeared one after the other in 
the various types of dictionaries that have been published since the first word lists in 
Sumer. This chapter traces their emergence and the birth of the genre, and then exam- 
ines modern DGUs. It ends with a few thoughts on how the DGU might evolve in the 
future. 


2.2 THE PREHISTORY OF THE DGU 


The ancient civilizations of the Near and Middle East, the Mediterranean, and China 
invented almost all the lexicographical products that were compiled afterwards, and 
more (see Boisson et al. 1991; Boulanger 2003), There were lists of words used in old 
texts, collections of the words of the current language, and lists of the names of things. 
Some lists were descriptions of the present, others were evocations of the past. Some 
were about the world, others were about the language. Some were arranged by themes, 
others by formal features, such as the way words were spelt, written, or pronounced. All 
were organized so that they could be consulted repeatedly rather than read from cover 
to cover once only. Some were functional and practical, some were more scientific. 
Some were descriptive, others were prescriptive. Some were monolingual, others were 
bilingual, or bi-dialectal, or multilingual. Many were intended to be used in schools, 
others were for the learned, for priests, for civil servants, for administrators, for rich 
people who needed listings of their possessions, or for poets. They all fixed in writing 
information that otherwise might have been lost, and they offered to a wider public 
information that would otherwise have been reserved for a few. They were designed 
to show and explain, to teach and facilitate the acquisition of knowledge and the mas- 
tery of the language, to educate and socialize, and ultimately to participate in the cohe- 
sion of the community.’ By the first centuries CE, virtually all the varieties of reference 
materials that we know today had been invented. But no civilization had what could be 
called a DGU. 

The history of lexicography after antiquity began again in Western Europe in the 
Middle Ages and the Renaissance. It can be seen as a series of steps leading to the DGU, 
by way of the bilingual dictionary, the monolingual dictionary, and the general dic- 
tionary, with the encyclopaedia as basso continuo. 


2 A few of the responses to these needs are identified in the Chronology of the present volume, 
which also provides further information on a number of the dictionaries discussed in this 
chapter. 
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2.2.1 Encyclopaedias 


The need to list all the elements of the universe and classify them in an orderly way, 
to discover and expose the hidden order behind apparent chaos, ‘the taxonomic urge’ 
(McArthur 1986: 32), is probably the oldest and strongest motivation for listing words. 
It was present in all the civilizations of antiquity, and it continued in Western Europe 
in various forms.’ A famous early example of a work that could be called encyclopaedia 
was the Etymologiae sive origines of Isidore of Seville, compiled in the seventh century. 
Many others followed, all with different contents and arrangements, each reflecting the 
vision of the organization of the universe of the author and his times. In the seventeenth 
century, French lexicographers re-invented the dictionary of proper names, a genre that 
had existed in Greece and in Rome. Louis Moréri’s Grand Dictionnaire historique, ou 
mélange curieux de l'histoire sacrée et profane (1674) and Pierre Bayle’s Dictionaire his- 
torique et critique (1697) were both used and adapted in several European countries. In 
England, Ephraim Chambers published his Cyclopaedia, or an Universal Dictionary of 
Arts and Sciences in 1728. It was organized like a dictionary in alphabetically ordered 
articles but with many cross-references and a Preface presenting the whole of human 
knowledge in a rationally organized hierarchy. It was the basis of the Encyclopédie ou 
Dictionnaire raisonné des sciences, des arts et des métiers in thirty-five volumes, pub- 
lished between 1751 and 1772 by Denis Diderot and Jean Le Rond d’Alembert, a major 
work in the philosophy of the Enlightenment. The Encyclopédie inspired all the ency- 
clopaedias that followed: the Britannica (1768-71) in England, the Brockhaus (1808) in 
Germany, and many others. They flourished at the same time as general dictionaries, 
with which they shared the ambitions of exhaustiveness and scientific accuracy (see 
Section 2.2.4). 


2.2.2 Bilingual Dictionaries 


The history of dictionaries in Western Europe begins with the bilingual glosses added 
to Latin manuscripts in the Middle Ages. The story is well known, and does not have to 
be detailed here, except that it is also the history of the DGU. These glosses, in simple 
Latin or in the vernacular, or in both, were soon collected to form separate volumes 
called glossaries, in which the entry words had been stripped of the marks of syntactic 
context and the information given in the entries was independent of any particular 
text; this ‘de-contextualization’ is one of the main characteristics of lexicography. As 
time passed, the glossaries listed more words and they gave more information: they 
became dictionaries. The first explained Latin words in a vernacular language, but 
they were soon reversed to produce dictionaries explaining the words of the vernacular 


3 On the history of encyclopaedias, see Rey (1982). 
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language in Latin. The operation was easy, because the microstructural information 
was minimal, often a one-word translation or a short phrase, and the resulting diction- 
aries could be useful for students and for all those who had to write in Latin, clerks in 
law courts, in political institutions, and the like. These dictionaries provided informa- 
tion on the vernacular and its usage, and this was another important step in the direc- 
tion of the DGU. 

Dictionaries of vernacular words with explanations in Latin were published in many 
of the languages of Europe. For English, the earliest example was the Promptorium 
Parvulorum, sive Clericorum attributed to Galfridus Grammaticus, an alphabetical list 
of about 12,000 English words compiled around 1440 and printed in 1499. In Spain, 
Nebrija published a two-way bilingual, the Dictionarium latinum-hispanum et his- 
panum-latinum, in 1495. In France, Estienne’s Dictionaire francoislatin contenant les 
motz et manieres de parler francois tournez en latin was published in 1539, with around 
10,000 entries, each provided with translations in Latin and sometimes explanations 
in French. For German, Josua Maaler (Pictorius) published Die Teiitsch Spraach, 
Dictionarium germanicolatinum novum in 1561, with about 11,000 entries. 

The next stage was the compilation of bilingual dictionaries between two vernacular 
languages. In England, the first to be printed, in 1547, was A Dictionary in Englyshe and 
Welshe by William Salesbury, for speakers of Welsh, but the demand was mainly for 
English-French and French-English dictionaries. A dictionarie French and English was 
published in 1567 or 1568 and it was followed by many others. In a few decades, bilingual 
and multilingual dictionaries were produced with many of the languages of Europe. 
For example, John Florio’s A Worlde of Wordes, or Dictionarie in Italian and English, 
a dictionary of Italian words with English definitions and translations, was published 
in 1598. 


2.2.3 Monolingual Dictionaries 


From a bilingual dictionary between two vernacular languages it was easy to produce 
a monolingual dictionary, or even two, and there was a demand. In the sixteenth cen- 
tury, vernacular languages were used more and more in all circumstances. More peo- 
ple needed a monolingual dictionary of the vernacular in their professional lives or 
in their intellectual and cultural activities. More people were reading and there were 
more documents to be read—the first periodicals appeared in the beginning of the six- 
teenth century.’ Also, the interest for the kind of knowledge that can be acquired by 
reason and by science was increasing. ‘The first monolingual dictionaries appeared at 


* Relation aller Fiirnemmen und gedenckwirdigen Historien, in German, printed from 1605 
onwards, may have been the first newspaper. The first English-language newspaper, Corrant out of 
Italy, Germany, etc., was published in Amsterdam in 1620. In France, the first was La Gazette in 1631 
(Wikipedia, 13 February 2012). 
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the same time as cabinets of curiosities, the ancestors of our museums.’ Vernacular 
languages were also becoming objects of scientific study. 

In England, in the late sixteenth century, school manuals began to feature lists 
of English words, claiming that English was as good as Latin for teaching. Richard 
Mulcaster’s The First Part of the Elementarie (1582) had about 8,000 undefined 
words, and Edmund Coote’s The English Schoole-Maister (1596) had around 1,500 
with short glosses in English. Mulcaster, in a section entitled ‘A perfit English dic- 
tionarie wished for’, declared that a dictionary of English was ‘a thing verie praise- 
worthy . .. if som one well learned and as laborious a man, wold gather all the words 
which we vse in our English tung .. . into one dictionarie’. But his wish was not to be 
realized soon. 

A Table Alphabeticall, conteyning and teaching the true vvriting and vnderstanding of 
hard vsvall English wordes,° published in 1604 by Robert Cawdrey, is usually regarded 
as the first monolingual dictionary of English. It had a general, non-specialized word- 
list, but it was not the type of dictionary that Mulcaster expected. Its 2,500 entries were 
‘hard usual words’, neologisms that were being imported into English, mainly from 
Latin and French. It explained each entry word by a synonym or two, or a short gloss 
in simple language, so that it looked like a bilingual. McArthur (1998a: 202) calls it a 
‘crypto-bilingual’. The Table had a series of successors in the seventeenth century: John 
Bullokar’s English Expositor in 1616, Henry Cockeram’s English Dictionarie in 1623, 
Thomas Blount’s Glossographia in 1656. After that, the English dictionary of hard words 
disappeared. 


2.2.4 The First General Dictionaries 


In the course of the seventeenth century, monolingual dictionaries kept adding more 
words, particularly common words and function words. They were becoming gen- 
eral. Kinds of general monolingual dictionaries had been produced in China in the 
second or third century BcE and in Iraq in the eighth century cE, but in Europe they 
were unknown. The problem was not technical: the wordlist of a general monolin- 
gual dictionary is more or less the same as that of a bilingual dictionary and the ele- 
ments of information on pronunciation, spelling, syntax, morphology, etymology, 
meaning, and usage were all known. The problem was, who needed a dictionary con- 
taining the most common words, even those that everyone knew, with their defini- 
tions? Compiling a dictionary has always been a huge investment: the thousands of 
working hours by experts require a large amount of money with limited hopes of 


5 One of the earliest was that of Ole Worm in Denmark (1588-1655). In England, Sir Hans Sloane 
{1660~1753) bequeathed his collection to the Kingdom, and it became the British Museum. 

® The use of the adjective alphabeticall in the title indicates that this was not the obvious choice of 
organization for a dictionary. On thematic (or onomasiological or topical) dictionaries, see Htillen 
{a999). 
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reaping benefits, either for the lexicographer or for the publisher. Why would anyone 
take the risk? 

In some European countries, the motivation was to sing the praises of the Jan- 
guage, at a time when nations were taking shape and found themselves competing for 
riches, for territories, for prestige, and for influence. General dictionaries were com- 
piled to show how venerable, how rich, how harmonious, how regular the language 
was, how superior it was to all other languages. They became ‘portraits’ of a language 
and emblems of a community. In France, Jean Nicot’s Thresor de la langue francoise, 
published in 1606, two years after Cawdrey’s Table, was not a pure monolingual—its 
16,000 entries were given Latin equivalents as well as definitions—but it had all sorts 
of words and it was explicitly designed to show the richness and beauty of the French 
language. The Tesoro de la lengua castellana o espanola, published in 1611 by Sebastian 
de Covarrubias, explored the origins of Spanish words, to reveal their ‘true’ meanings, 

In the seventeenth and eighteenth centuries, a new trend in continental European 
lexicography was the compilation of large standardizing dictionaries sponsored by 
language academies, often biased towards literary vocabulary. In Italy, the Accademia 
della Crusca, created in 1582 to sort out the good language from the bad, as its name 
(literally ‘Academy of the bran’) suggests, published its Vocabolario degli Accademici 
della Crusca in 1612. In France, the Académie francaise published its Dictionnaire de 
Académie in 1694. The Real Academia Espajfiola’s Diccionario de Autoridades, or 
Diccionario de la lengua espanola, in six volumes, the largest dictionary of Spanish 
ever produced, was published in 1726-39. 

In England, the dictionaries that gradually replaced the dictionaries of hard words 
were not produced byan academy but they were more and more general. The New World 
of English Words, or a General English Dictionary by E. P. (probably Edward Phillips), 
published in 1658, only two years after Blount’s Glossographia, was the first English 
dictionary to use the word general in its title, but it had only about 11,000 entries. It 
was also the first prescriptive English dictionary: it marked words that the lexicogra- 
pher thought were unacceptable. John Wilkins and William Lloyd claimed that their 
‘Alphabetical Dictionary’, published in 1668, had all the words of English, though it 
only lad around 13,000. A New English Dictionaryby J. K. (probably John Kersey), pub- 
lished in 1702, had an explicit subtitle: a Compleat Collection of the Most Proper and 
Significant Words, Commonly Used in the Language; With a Short and Clear Exposition 
of Difficult Words and Terms of Art.'The common words were included to indicate spell- 
ing, and they were just glossed, but it was clearly a step towards the general dictionary. 
Kersey’s edition of The New World of English Words, in 1706, had about 38,000 words, 
a huge figure for the time. Incidentally, Kersey was also the first quasi-professional 
English lexicographer (Read 2003: 222); the word lexicographer is dated 1658 in the 
Oxford English Dictionary. An Universal Etymological English Dictionary, published in 
1721 by Nathan Bailey, with almost 1,000 pages and about 40,000 entries, claimed to 
have all the words of English, and it did have old and obsolete words, dialect words, 
four-letter words, as well as most (but not all) common words. Bailey’s Dictionarium 
Britannicum: or a more compleat universal etymological English dictionary than any 
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extant ..., published in 1730, had about 48,000 entries, ‘easily the most comprehensive 
English dictionary of its day’ (Osselton 2009: 151), and there were several later editions 
with up to around 60,000 entries. 

Samuel Johnson’s A Dictionary of the English Language, in which the Words are 
deduced from their Originals, and illustrated in their Different Significations by 
Examples from the best Writers, published in 1755, was not produced by an academy 
but it aimed ‘to preserve the purity, and ascertain the meaning of our English idiom’ 
(Preface), and Johnson repeatedly compared his work with the dictionary of the 
Académie francaise. His dictionary had about 42,000 entries and was greeted as a 
splendid achievement: it was ‘what its age demanded—a standard and standardizing 
dictionary which included ...an extensive list of words ...explained by divided and 
classified definitions, and illustrated with quotations from authorities’ (Sledd and 
Kolb 1955: 44). There were many abridged versions in the following years that were 
DGU-sized; the practice of producing small dictionaries on the basis of a larger one has 
survived. 

In the United States, at the turn of the nineteenth century, Noah Webster wanted 
to establish American English as a variety in its own right, and he understood that 
the American nation needed a dictionary proposing a standard. His Compendious 
Dictionary of the English Language (1806) was based on dictionaries published in 
England, but it had American words (butternut, caucus, checkers, chowder, hickory, 
skunk, etc.) and Webster's preferred spellings (center, defense, determin, honor, lether, 
musick, program, etc.). It also had about fifty pages of back matter with encyclopaedic 
material (currencies, weights and measures, a history of the world, divisions of time, 
the Jewish, Greek, and Roman calendars, the number of inhabitants in the United 
States, export figures, remarkable events and discoveries, and all the post offices of the 
United States). It was the first in a long lineage of all-purpose dictionaries that can be 
consulted by adults and children on all sorts of subjects. It was one of the first DGUs. 
His American Dictionary of the English Language (1828), the first dictionary to use the 
word American in its title, realized the same ambitions more fully. It had about 1,600 
pages and 70,000 entries, with American words (boss, moose, noodle, prairie, squash, 
tomato, etc.) and spellings, and quotations from American authors. 

Another trend was the compilation of extensive encyclopaedic dictionaries. The 
ancestor was Antoine Furetiére’s Dictionnaire universel, contenant généralement tous 
les mots francois tant vieux que modernes et les termes de toutes les Sciences et des Arts, 
published in 1690. It had about 40,000 entries with all the more common words but 
also many specialized terms and abundant encyclopaedic information, like many of 
our modern general dictionaries. The genre continued in the nineteenth and twenti- 
eth centuries in France, particularly with the Larousse dictionaries (see Section 2.3.1). 
In the United States, the Century Dictionary and Cyclopedia of William Dwight 
Whitney, published between 1889 and 1891, had about 500,000 entries. The series of 
unabridged dictionaries derived from Webster's American Dictionary of 1828 were 
also encyclopaedic, and each edition had more words than the preceding one: 175,000 
in 1890, about 400,000 in 1909, and around 600,000 in 1934, probably the dictionary 
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of modern times with the greatest entry-count. They had abridged versions that were 
DGUs (see Section 2.3.2). In Britain, John Ogilvie’s The Imperial Dictionary of the English 
Language: A Complete Encyclopedic Lexicon, Literary, Scientific, and Technological, 
derived from a Webster dictionary, was published between 1847 and 1850. It had about 
100,000 words. The tradition of the encyclopaedic general dictionary continued with 
the Chambers dictionaries and more recently with the Collins English Dictionary (see 
Section 2.3.2). 

In the nineteenth century, a new kind of large dictionary developed, the historical 
dictionary as represented by the Grimm brothers’ Deutsches Wérterbuch (see John 
Considine’s Chapter 10 in this volume). In France, Emile Littré’s Dictionnaire de la 
langue francaise was published in 1863-73. In England, the New English Dictionary 
on Historical Principles, or Oxford English Dictionary (OED), was begun in 1857 and 
finished in 1928, the largest language dictionary of English ever produced, Many 
other languages had their extensive dictionaries on the same model. They were for 
the learned rather than for general users, but they re-enter our story in their abridged 
forms (see Section 2.3.2). 


2.3 DICTIONARIES FOR GENERAL USERS 


2.3.1 DGUs to Educate the Middle Classes 


In the eighteenth and nineteenth centuries, the number of readers increased sharply 
in many European countries. The schooling system was improved. There were more 
authors, more books, and more documents to be bought and read for business or for 
pleasure. The printing presses and the publishing houses were more productive, peri- 
odicals flourished, public libraries were created, as well as academies, and learned 
associations. All this has been documented.’ In 1728, Ephraim Chambers wrote (in his 
Cyclopaedia) that there were too many books, and Samuel Johnson is reported to have 
said a few years later that Britain was ‘a nation of readers’.® A vast population of read- 
ers was indeed emerging. They belonged to an intermediate class that we now call the 
middle class but was then called the bourgeoisie. They were characterized by their social 
aspirations. 

In the nineteenth century, dictionaries were compiled for this growing public. They 
were bought by all middle class households and used by all the members of the fam- 
ily, father, mother, and children. Middle class families used them because they needed 
help to cope with difficulties caused by a lack of education or knowledge, because they 
wanted to avoid being identified as middle class or lower middle class, because they 
wanted to improve their social status. They needed books that would guide them, give 


? Onrates of literacy in European societies, see Cavallo and Chartier (1997). 
8 In Boswell’s Life of Johnson (1791). 
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them access to the codes used by the elite; they needed a grammar, an encyclopaedia, 
an atlas, an almanac;” at the very least, they needed a dictionary that had information 
on words as well as on things, and that was easy to consult. Lexicographers were will- 
ing to give them what they needed: they are pedagogues—-many were schoolmasters or 
authors of school manuals, They are masters in the art of selecting important informa- 
tion and encapsulating it in small digestible bits. They believe in the power of the book 
to educate the individual, to participate in his or her individual and social success and 
eventually to improve the community. 

The kind of dictionary that was compiled for the improvement of its users and the 
establishment of a better society and that was bought by virtually all families was 
illustrated, if not created, by Webster with his Compendious Dictionary (1806) and 
his American Dictionary of the English Language (1828). But the best example of a 
lexicographer dedicated to the education of the middle classes is probably Pierre 
Larousse. Like Webster, Larousse began by writing school manuals and went on to 
writing, editing, and publishing dictionaries. He was ‘an admirable servant of social 
knowledge, a tireless “schoolmaster” of progressive thinking, aggressively engaged 
in the diffusion of information’ (Collinot and Maziére 1997: 46). His Nouveau dic- 
tionnaire de la langue francaise, published in 1856, sold 5 million copies before it was 
replaced by Le Petit Larousse illustré in 1905, that was even more successful. Petit 
Larousse recorded a language that was more contemporary and usual than ancient 
and literary. It had short and simple entries, an encyclopaedic section, and a section 
of Latin quotations. It was a truly democratic dictionary: ‘Larousse worked towards 
the accomplishment of a work whose scientific quality would be worthy of the public 
of lower middle class people, school teachers and petty civil public employees who 
used it to improve their education and to help them work more efficiently towards 
a lay and democratic society’ (Matoré 1968: 127). No wonder Lenin later wanted to 
emulate it for Russian! 

In Britain, the brothers Robert and William Chambers (no relations of Ephraim), 
authors of the Encyclopedia, A Dictionary of Universal Knowledge for the People 
(1859-68), ‘called themselves “publishers for the people”, expressing through this slo- 
gan a particularly Scottish and Presbyterian desire to spread learning to all men and 
women’ (McArthur 1986: 134). Chambers’s English Dictionary (1872) was ‘intended with 
acrusade-like zeal for everybody’ (McArthur 1998a: 135).° 


° Whitakers’ Almanack was first published in 1868. 

° A strangely dissonant example was, in France, Maurice Lachatre, who used his Dictionnaire 
frangais illustré (1858) to combat the dominant values of society and convey revolutionary ideas: ‘Now, 
the mothers and fathers of poor families. .. can educate their children, boys or girls, without having to 
resort to a master or professor and without sendi ng the children to a school or college’ (Pruvost 2001; 
71). [‘Désormais, les méres ou les péres de famille pauvres et dénués de ressources... peuvent faire 
l'éducation de leurs enfants, filles ou garcons, sans qu’il soit nécessaire de recourir a aucun maitre ou 
professeur, ni d’envoyer les enfants 1’ école ou dans les colleges’ ] 
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2.3.2 DGUs for all the Family 


The DGU of the twentieth century still stands ready to help everyone in the family with 
linguistic and encyclopaedic difficulties, but the lexicographers have reduced ambi- 
tions, They have no hope of improving society, and they still sing the praises of the 
language only in some emerging nations. General users, however, continue to see their 
dictionary as a book that contains all the truth and nothing but the truth on language 
and on the world, and ultimately as an instrument of socialization. 

In America, the general all-purpose dictionary launched by Webster flourished 
with the ‘college’ dictionaries, originally designed for students but bought and used 
by families. Merriam-Webster’s Collegiate, considered as the model of the genre, was 
first published in 1898. It had about 1,000 pages of dictionary text plus a back section 
with Scottish words, mythology, famous quotations, abbreviations, etc., many pictorial 
illustrations, and a number of usage notes, but no quotations in the entries. The fol- 
lowing editions (the eleventh in 2009, MWCD), as well as the competitors produced by 
other publishers, Funk & Wagnalls New College Standard Dictionary (1947), Webster's 
New World Dictionary of the American Language, College Edition (1953), Random 
House College Dictionary (1968), American Heritage Dictionary, Second College Edition 
(1982), were very similar: they were encyclopaedic, with extensive wordlists and simple 
microstructures. They were very successful: from about 1945 to the 1990s ‘about 2 mil- 
lion... were sold each year’ (Landau 2009b: 361). 

Most of the languages of Europe had their DGUs in the course of the twentieth 
century, many of which were updated and re-edited several times and some of which 
were made available in electronic form. In France, in addition to the yearly edition 
of Petit Larousse, there was Le Petit Robert in 1967, an abridged version of Le Grand 
Robert in six volumes published in 1953 (Dictionnaire alphabétique et analogique 
de la langue frangaise). Petit Robert is non-encyclopaedic, with an elaborate micro- 
structure and many literary quotations, obviously for the well educated. It now hasa 
yearly edition, like Petit Larousse. In Britain, a new edition of the Chambers’s English 
Dictionary entitled Chambers’ Twentieth Century Dictionary came out in 1901. It 
was partly historical, with meanings in chronological order, and partly encyclopae- 
dic, with an extensive wordlist, a few pictures, appendices with Latin, Greek, and 
foreign quotations, the Greek and Russian alphabets, Roman figures, mathemati- 
cal symbols, etc. It was the ideal companion of the crossword addict and remained 
the main reference for the style of most British publishers for a long time, because 
it was ‘the British dictionary most frequently revised and therefore most responsive 
to present day usage’ (Scragg 1974: 86). The more recent editions give meanings by 
decreasing order of ‘importance’. They have been the official reference for Scrabble” 
competitions. The Collins Dictionary of the English Language (CED), modelled on 
the American college dictionaries, was first published in 1979. It has proper names, 
many technical and scientific words, etymologies, and usage notes. It has been 
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very successful. The other small-size dictionaries, the Concise Oxford Dictionary 
(COD; first edition 1911; from eleventh (2004) titled The Concise Oxford English 
Dictionary; twelfth 2011) or the Pocket Oxford Dictionary (1924), were derived from 
the OED: they have no proper names and they are more linguistic than encyclopaedic. 


2.4 CURRENT ISSUES 


Modern DGUs are books in one volume or a CD-ROM, they are relatively small and 
relatively cheap, they are bought and used by the entire population of readers, and they 
are traditionally owned by all families. They are monolingual and alphabetical, they 
have general wordlists and general microstructures, and they give general information 
on words and on things. A study of the use of DGUs in Britain carried out by Longman 
a few years ago concluded: 


Looking up meaning was actually the most frequent use for the dictionary in most 
households, with checking for correct spelling coming second. Reference to the 
dictionary for word meanings was not for common words, but ‘hard words’: 


1, words commonly confused or misused (e.g. aggravate being used to mean 
annoy, instead of make worse; infer being used instead of imply), 

2. encyclopedic words—from science and technology, politics, economics, etc.; 

3. new words (e.g. rate-capping, spreadsheet); 

4. rare or obsolete words (abigail, pellucid). 


The dictionary... was more commonly referred to for word games and to set- 
tle family arguments than for schoolwork or individual interest. (Summers 
1988: 11314) 


‘The use of dictionaries in word games, one of the channels through which the lan- 
guage is permanently standardized, was confirmed in a survey for Oxford University 
Press: ‘a third of all dictionary use today is by people seeking help in word games’ 
(Augarde 1999: 352). The use of encyclopaedic information in DGUs has never been 
studied. DGUs are seen by their owners and users as the ultimate authority on usage 
in their native language. They are normative for the users if not for the compilers, with 
an influence that depends on the prestige of the publisher, of the lexicographer, and of 
the dictionary. 

Modern DGUs vary in their wordlists, in their microstructural information, and in 
their style of presentation. A full treatment of the questions that their compilers have 
to solve is clearly beyond the scope of the present chapter, but a few issues will be men- 
tioned in the following sub-sections, with examples from English, American, and 
French DGUs. 
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2.4.1 A General Wordlist 


2.4.11 What Words? 


The wordlist of a DGU is called general because it has all categories of words and 
phrases, all parts of speech, and all varieties. It includes the more common words as 
well as some slang and colloquialisms, regionalisms, scientific and technical words 
{an increasingly important category), words from major literature, idioms, phrases, 
and abbreviations, in varying numbers and proportions. DGUs tend to be generous in 
neologisms, because new words are what many users look for in a dictionary, and cau- 
tious with foreignisms, because the users do not want a ‘distorted’ image of the lan- 
guage. Many DGUs also have proper names. All categories of entries can be in one list 
(CED), or in separate lists. Petit Larousse has a second part for proper names, almost 
half the size of the dictionary, as well as back matter with lists of the names of mem- 
bers of the French académies, the names of Nobel, Booker, and Cannes prize winners, 
etc. MWCD has separate lists for foreignisms, names of people, and names of places. 
COD has lists of names of countries, Prime Ministers, Presidents, and Kings and 
Queens, weights and measures, chemical elements, etc." Petit Robert has no proper 
names, but the Robert publishers produced an encyclopaedic DGU in 2009, Dixel, a 
direct competitor of Petit Larousse. 

Among the categories of words that are usually excluded are words found only 
in older texts (except for the more prestigious texts studied in school), rare regional 
words, and highly specialized terms. Vulgar and offensive words such as taboo words 
or ethnic slurs are totally banned from some DGUs. 


2.4.1.2 How Many Words? 


DGUs should have as many words as possible, because they are mainly used to find 
information on the meaning of difficult words. The number of words is important, 
whatever metalexicographers and linguists say. The problem in paper dictionaries—not 
in electronic dictionaries—is that more words means less space for the treatment of 
each word. An early example of a minimal microstructure leaving space for more words 
was the Chambers’s Twentieth Century Dictionary (1929): 


General adj. relating to a genus or whole class: including many species: not special: not 
restricted: common: prevalent: public: loose: vague 


Modern DGUs are more precise and more explicit. The entry for general adj. in the 
Chambers 2ist Century Dictionary (1996) is nine lines long and has five numbered senses. 

Of course, even the larger dictionaries have to select their words to keep only the 
more ‘important’, that is, roughly, the more frequent in usage. Metalexicographers say 


1 The ‘megastructure’ of dictionaries, i.e. the organization of their different parts, front matter, 
back matter, etc., would deserve a diachronic and synchronic study in itself. 
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that a good wordlist is representative: it has all the function words and a selection of 
lexical words including the more common. Shouldn't DGUs omit the more common 
words, if they are never consulted, to make space for rare words that are really prob- 
lematic for the users? This would seem a sensible position (see Weinreich 1962: 26-7; 
Quine 1973: 249), but the users want their DGU to represent the whole language, and 
this emblematic function of the dictionary is as important for them as its more practi- 
cal functions. 

In most DGUs, meanings are given in decreasing order of ‘importance’. MWCD con- 
tinues the Webster tradition of ordering meanings historically, oldest meaning first. 
Both methods have their advantages: importance makes a better portrait of the con- 
temporary language; chronology provides a useful insight into the evolution of the 
word. In both cases, the lexicographers find it difficult to order meanings in a way that 
is both user-friendly and scientifically impeccable. 

In Britain and in more and more countries, the choice of words, and to some extent 
of meanings, is based on the exploitation of an electronic corpus of texts; in France, the 
United States, and a number of other countries, the selection process has apparently 
remained more traditional. The lexicographers who use a corpus usually design it to 
contain the more representative usages so that the dictionary is focused on the more 
common words and meanings. 

It is difficult to say how many words a DGU should have, because not all dictionaries 
say how many they have, and when figures are given they are difficult to interpret. CED 
says it has 3,000,000 words of text’, MWCD says ‘165,000 entries and 225,000 defini- 
tions’, COD says ‘over 240,000 words, phrases and meanings’. The Oxford Dictionary 

of English, with ‘over 350,000 words, phrases, and definitions’ is probably too large to 
be a typical DGU, and the Pocket Oxford Dictionary, with 120,000, may be too small. 
Interestingly, Petit Larousse (2006) announces 59,000 words and Petit Robert (2006) 
60,000 words and 300,000 meanings. Does French have fewer words than English but 
more meanings for each word? Do French lexicographers split meanings more than 
their English colleagues? 


2.4.2 A General Microstructure 


The microstructural programme of DGUs is said to be general because it includes at 
least a definition or an explanation for each meaning of each word in the wordlist. It 
usually also gives information on part of speech, grammar, pronunciation, and some- 
times on etymology and usage. Some DGUs have examples, pictures, and quotations. 

Definitions are in the classical form, with genus word and differentia, whenever pos- 
sible. They can be in different styles, from the ‘humanist’ to the ‘encyclopaedic’. For 
instance, peach in COD 2004 is: 


the Chinese tree which bears peaches (Prunus persica) 
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andin MWCD: 


a low spreading freely branching Chinese tree (Prunus persica) of the rose family 
that is cosmopolitan in cultivation in temperate areas and has lanceolate leaves, ses- 
sile usu. pink flowers borne on the naked twigs in early spring, and a fruit which is a 
single-seeded drupe with a hard endocarp, a pulpy white or yellow mesocarp, and a 
thin downy epicarp 


One is easily accessible but serves only to recognize a more or less familiar concept; the 
other is difficult but introduces useful scientific vocabulary. New defining techniques, 
such as the full-sentence definition of learners’ dictionaries (see Reinhard Heuberger’s 
chapter in this volume) are not used, and no DGU has a limited defining vocabulary. 
Definitions are ideologically as close to the dominant values of the society as possible, 
to accommodate the largest number of users. The differences in the treatment of such 
words as abortion, creation, or God are interesting, but subtle. Modern DGUs want to be 
scientific, and they succeed in being impersonal; some would say that they are boring. 

Grammar information and usage information are minimal, because the users are sup- 
posed to master the rules of their native language. Verbs can be transitive or intransi- 
tive, irregular verbal and adjectival forms are given, and irregular plurals, but not much 
more. The fact that, for example, loath can only be used predicatively, that galore must 
come after the noun, that budge can only be used negatively, that place in the sense ‘place 
where one lives’ must be preceded by a possessive, that responsible is used with for is rarely 
indicated. At best the information can be retrieved in the examples. American college 
dictionaries and the more recent British DGUs, however, have usage notes: Chambers 21st 
Century Dictionary has a few, for example on the difference between magic and magical, 
and COD has many, for example on the fact that hers should not be written her’s, on the 
difference between heterogenous and heterogeneous, or on acomparison of illegal, unlaw- 
ful, and illicit. Do British users need to encode more than they used to, or are they less 
skilled in encoding than they used to be, as the note on hers seems to indicate? 

Pronunciation is given in most DGUs, for all words or only for the more difficult. 
It used to be given in an ad hoc code but the more recent DGUs use the International 
Phonetic Alphabet (IPA). COD adopted it in 1990. IPA is what most metalexicogra- 
phers recommend, but an ad hoc system, as in MWCD, is not a bad solution in a DGU, 
because most users use only one dictionary and therefore can quickly get used to its 
coding system. 

Many DGUs do not have quotations. MWCD has a few, ‘to show words in typical 
contexts’. Petit Robert has many, and they are clearly models of prestigious literary style 
to be admired, ‘scraps of the purple’ [‘des lambeaux de pourpre’], as Littré said in his 
Preface, rather than illustrations of normal usage. For example, orgueilleux is illus- 
trated by a quotation from Stendhal’s Le Rouge et le noir: 


Cet étre que l’on a vu a Verriéres si rempli de présomption, si orgueilleux, était 
tombé dans un excés de modestie ridicule. [This creature whom we saw at Verriéres 
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so filled with presumption, so arrogant, had fallen into an absurd extreme of mod- 
esty. Translation Scott Moncrieff.] 


A beautiful sentence, but probably not very illuminating for someone who does not 
know what orgueilleux means. 

All DGUs have usage labels for domain, if only because they serve to differentiate 
senses, and labels for register to mark words or usages that are or can be awkward or 
offensive. In the United States, some use the latter abundantly. Landau (2000: 115) cites 
the American edition of the Encarta World English Dictionary (1999): 


minority (sense 4) ‘OFFENSIVE TERM offensive term for minority member, now avoided 
by careful speakers because it can cause offense (offensive). 


Etymology is often present, if only minimally. Many users like it, because it is typi- 
cally ‘prestigious’ knowledge, and it can also be useful for memorizing words. Some 
DGUs have pictorial illustrations (see Annette Klosa’s chapter in this volume): MWCD 
has a few black-and-white drawings; Petit Larousse has about 5,000 colour drawings 
and photographs as well as maps. COD, CED, and the more modern editions of the 
Chambers dictionaries have none. Many of us have fond memories of looking at the 
illustrations in our parents’ dictionary when we were children, and that may be a good 
reason for having illustrations in a DGU: to attract children to the dictionary. True, 
there are now many more sources available for illustrations of various kinds, and of 
much better quality. 

Conspicuously absent from DGUs are related words: synonyms, antonyms, hypo- 
nyms, hyperonyms, and the like (see Lynne Murphy’s chapter in this volume). Either 
the lexicographers think that the users know enough about those words, or the pub- 
lishers hope to sell specialized dictionaries. Some related words are used in the defini- 
tions, for example mammal to define badger, shy to define timid and vice versa, steal in 
the entry thief, calfin the entry veal, or heavy in the entry light. But other related words 
are rarely mentioned: strength and strengthen in the entry strong, thief and theft in the 
entry steal, veal in the entry calf salary in the entry income, mare in the entry stal- 
lion, and many, many others. Here again, Petit Robert stands apart: the entry fureur, 
for example, mentions enthousiasme, exaltation, inspiration, possession, transport, 
frénésie, rage, courroux, enrager, acharnement, furie, impétuosité, and violence in the 
sub-entries where they belong and printed in bold to be used as cross-references. The 
treatment is not systematic, but it is useful. A comparable treatment of lexical relations 
is available in English but only in learners’ dictionaries or in thesauruses. 

On the whole, linguistics has had little influence on DGUs. The only example of 
a DGU that reflected the research activities of linguists may be the Dictionnaire 
du francais contemporain (1966), which was clearly structuralist and described 
the use of words in their networks of paradigmatic and syntagmatic relations. 
But, interestingly, it was as much neglected by the public as it was admired by 
metalexicographers. 
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2.4.3 A Style Adapted to General Users 


DGUs used to provide their information in codes and abbreviations to save space. 
For example, in COD 1964: 


pliimb? (-m), v.t. & i. Sound (sea), measure (depth, lit. & fig.), with plummet, whence 
~'LESs (-ml-) a.; make vertical]; (intr.) work as plumber. [f. prec.] 


But the tendency is now to be more explicit. COD 2004 has no abbreviations, apart 
from part of speech, and it gives an example: 


plumb! s v. 1 measure (the depth of a body of water) 2 explore or experience fully or 
to extremes: she had plumbed the depths of depravity 3... etc. 


This is more space-consuming, but facilitates consultation. 

On the whole, recent DGUs are more user-friendly, on paper and of course even 
more so in electronic format: text layout, use of fonts, use of colour, etc., greatly 
improve the clarity and the readability of the text. This is particularly evident in 
Britain, where the same evolution can be seen in the latest learners’ dictionaries, 
perhaps as a consequence of studies of dictionary use that stressed users’ difficulties 
with dictionary codes and dictionary style in general. 


2.5 FUTURE DEVELOPMENTS 


Many DGUs have been best sellers, although it is difficult to have precise figures. 
Some went through several editions over extended periods of time, often several 
decades, sometimes more than a century. There is no record of a lexicographer hav- 
ing ever made a fortune, but the business has been reasonably profitable for the pub- 
lishers. This period of relative prosperity, however, is now coming to an end. In 2009, 
Landau (2009b: 382) noted that the sales of American college dictionaries were ‘only 
a quarter of what they were in their heyday’, and the sales of DGUs in Europe are 
also declining. It is not that dictionaries are not needed: the need for information on 
things and for guidance on linguistic usage has never been so high.” But the dictionary 
on paper or CD-ROM is no longer the preferred source: most people in need of general 
information will now opt for the Internet. The drop in sales figures that began with the 
larger dictionaries and encyclopaedias in the late twentieth century" is now spreading to 
smaller dictionaries and DGUs. 


2 In Britain, ‘Byron Primary in Bradford school has fewer than one in 20 pupils speaking English 
as their first language’ (The Daily Telegraph, 28 February 2012). 

3 Asi finish writing this chapter, I find on the Internet, dated 14 March 2012: ‘After 
244 years... Encydopaedia Britannica has decided to stop publishing its famous and weighty 32-volume 
print edition’ 
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Publishers have tried to adapt. The first reaction was, naturally, to cut production 
costs. The dictionaries that used to be produced from scratch or extracted from the 
text of larger dictionaries by teams of expert lexicographers over extended periods and 
updated regularly now tend to be derived more or Jess mechanically from databases or 
existing dictionaries by employees who do not need to be experts. No modern diction- 
ary is known by the name of its main compiler; Chambers, Larousse, Robert, Webster, 
and others are only memories of a time that is no more. The second reaction of the 
publishers was to make their dictionary texts available on the Internet, on hand-held 
devices, or on cell phones, and count on advertisements to cover the costs, which 
of course precipitates the demise of the paper or CD-ROM dictionary. Sites such as 
Wiktionary, FreeDictionary, YourDictionary, Dictionary.com, or OneLook have their 
own homemade entries, or entries from major dictionaries, including DGUs. 

Electronic dictionaries have well known advantages, but online DGUs are not ideal. 
In some situations, a paper dictionary will be easier, quicker, and more comfortable to 
consult, which is particularly important for a DGU. Also, online dictionaries are not 
always clearly identified: the user does not always know where the information comes 
from, whether the text was written by an expert, or indeed on what authority. And, 
because an online dictionary has virtually no limit to what it can contain, and because 
itis only virtual, it cannot pretend to represent the language in the same way as a paper 
or even a CD-ROM dictionary does, by its very presence, by the fact that it is a material 
object that can be seen, possessed, and handled. 

The DGU may be the category of dictionary that will survive longest in paper form, 
particularly in countries where people are less well equipped and less adept at using 
computers and the Internet, and in communities where people think that they need 
to possess a paper DGU and do not believe that it can be replaced by online sources. 
In France, the sales of Petit Larousse and Petit Robert have certainly declined but they 
are still quite high; the launching of the yearly edition is still an important event. But 
even the DGU is doomed in the Jong run. The age of the dictionary that represents 
the language is coming to an end. The DGU as we know it will eventually disappear, 
to be replaced by online sources that will provide the same information, and more, 
but will not play the same social role. It is strange to think that, at the same time as 
the major European languages are on the way to stopping their production of DGUs, 
many of the 6,000 odd languages of the world do not yet have one, and may never 
have one. 


2.6 CONCLUSION 


POPETESSLOPUESISTOSESIOSNTOSICOSEOCICCECIOSOOSEIOOCESOOEeOCOCICEIEICIOSIOSCECECETEIE SITLL ES Teeeree er erie Tee re rarer Peete iver err ere ree re svete ere serr err ere ree eter reser errs rrr irriry 


DGuUs are one of the late products of the 5,000-year history of lexicography. There 
were many others, some of which flourished only in some societies, and some existed 
only for a limited period. DGUs are certainly the most universal and the most signifi- 
cant of all. They have something of all the dictionaries that have been produced in the 
course of this history: bilingual or multilingual, thematic, specialized, standardizing, 
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extensive or etymological, dictionaries of proper names, of hard words, of synonyms, 
etc. DGUs have something ofall of them, and they are also partly grammars, encyclo- 
paedias, almanacs, and atlases. 

It is difficult to say exactly when the DGU appeared, just as it is difficult to say when 
the category of general users appeared. The histories of their emergence were slow and 
irregular. Many dictionaries, since the beginnings of lexicography, have had some of 
the features of the DGU, the size, the price, the pedagogic ambition, the contents, or 
the user-friendliness. Some were explicitly for humble users, replacing or complement- 
ing the formal] education to which these people did not have access. Cawdrey’s Table 
Alphabeticall in 1604 was for “Ladies, Gentlewomen, or any other unskilfull persons’ 
and some of its successors identified similar readerships. These dictionaries were for 
the general users of the times, only there were fewer general users. The DGU really 
became a genre when a given language was reasonably standardized, particularly in 
spelling, and when there was a large enough community of potential users who needed 
and could afford a book that would serve them asa guide to the niceties of language and 
to a better knowledge of the world. In Europe and the USA, it flourished in the nine- 
teenth century at the same time as a middle class appeared that needed to improve its 
education. 

The DGU has not changed much in form and content since the nineteenth century. It 
is difficult to define because it matured in slightly different ways in different linguistic 
communities, but everywhere it has played the same social role: to represent the lan- 
guage whose mastery was the key to social success. It is, or has been, a familiar feature 
in virtually all households and its users see it, or have seen it, as an indispensable object, 
even though they may use it rarely. They think itis infallible, even though they have lit- 
tle idea of how it is produced. 

The DGU has often been dismissed by linguists: it is not a prestigious lexicographi- 
cal product, it does not have the scientific authority of the larger dictionaries that are 
often its parents, and it is not compiled by academics. It is incomplete, simplified, and 
as much concerned by accessibility as by accuracy. It is a commercial object. Yet it is 
an interesting artefact for historians and for sociologists. Its presence and its success 
are evidence that it fills, or has filled, a quasi universal need: in all countries, people 
believe in the importance of language in the process of social integration, and they 
count, or have counted, on a dictionary to be the authoritative guide that willgive them 
the answers they need. Inits role as a source of information on words and on things, the 
DGU is being advantageously replaced by online sources; in its role as a portrait of the 
language and an instrument of social accomplishment, it may prove difficult to replace. 
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3.1 INTRODUCTION 


More than seventy years have passed since the publication of A. S. Hornby’s Idiomatic 
and Syntactic English Dictionary (1942), which was reprinted a few years later by Oxford 
University Press as A Learner's Dictionary of Current English (1948). Hornby’s lexico- 
graphic milestone firmly established a distinct genre of dictionary which has been at 
the forefront of lexicographical innovation during the past few decades. And the evolu- 
tion of learners’ dictionaries has certainly not come to its end. Electronic media and the 
internet have created new opportunities but also new challenges for both dictionary 
makers and dictionary users. This chapter aims to sketch the remarkable history and 
development of monolingual learners’ dictionaries (MLDs), focusing on their salient 
features, and on current issues and future trends in the field. 

Foreign learners’ demands ona dictionary differ fundamentally from those of native 
speakers (cf. Herbst 1990: 1379), thus MLDs show a number of peculiarities (cf. Section 
3.3). Among their most striking features is the use of a restricted and simplified lan- 
guage for the definitions. In contrast to dictionaries aimed at native speakers, MLDs 
provide detailed guidance on grammar and usage, for example syntactic patterns 
and usage notes.' Only learners’ dictionaries include a vast number of example sen- 
tences and collocations, which also increases their usefulness for encoding, that is, for 
language production. On the other hand, historical and etymological data are often 
excluded, as they are not regarded as very helpful for learners (cf. Strevens 1987: 77). 

From a didactic point of view, the utility of MLDs for intermediate and advanced 
learners is undisputed. Only monolingual dictionaries have the extra merit of 


* Some dictionaries for native speakers nowadays have usage notes as well, e.g. the current edition 
of The American Heritage Dictionary of the English Language (sth ed., 2011). 
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introducing the user to the lexical system of the foreign language (cf. Béjoint and 
Moulin 1987: 104), thus promoting a more rapid expansion of both the active and pas- 
sive vocabulary. Several studies have shown that advanced learners using monolingual 
dictionaries indeed obtain better results (cf. Rundell 1999: 41). Much to the chagrin of 
educators, however, learners often prefer bilingual dictionaries because they find them 
easier to use. So-called ‘bilingualized’ dictionaries, that is, monolingual dictionar- 
ies with translations for each entry and sub-entry, combine the central elements of 
monolingual and bilingual reference works and are thus suited to reconciling the con- 
flicting demands of language learners and teachers; they will be discussed briefly in 
Section 3.4.1. 

Foreign learners of English have a tremendous choice of dictionaries tailored to their 
needs, in both print and electronic form (cf. Section 3.4.2). The lucrative EFL (English 
as a Foreign Language) market is currently dominated by six major publishers, five 
of which are British: Oxford University Press, Pearson Longman, Collins Cobuild, 
Cambridge University Press, Macmillan ELT, as well as the American publishing 
house Merriam-Webster, whose first MLD entered the market only recently in 2008. 
The following section gives a short overview of the development of the English learners’ 
dictionary, focusing on the history of the print editions. 


3.2 A BRIEF HISTORY OF ENGLISH 
LEARNERS’ DICTIONARIES 


The genesis of the learners’ dictionary” lies in the endeavours of three teachers of 
English asa foreign language, one of whom worked in India (Michael West) and two in 
Japan (H. E. Palmer and A. S. Hornby) (cf. Jackson 2002: 129). Each of them was involved 
in research projects dedicated to the teaching of English. West became a major con- 
tributor to the ‘vocabulary control’ movement, which was concerned with establishing 
systematic criteria to select the most useful words for language learning. Together with 
James Endicott, he compiled The New Method English Dictionary (NMED, 1935), some- 
times labelled the first monolingual learners’ dictionary (Cowie 1999: 33). NMED con- 
tained only about 24,000 headwords defined using a limited vocabulary of 1,490 words, 
which made it suitable for intermediate rather than advanced learners. It was intended 
primarily to meet the decoding needs of its users, whereas its usefulness for encoding 
was limited by the absence of syntactic guidance, the insufficient treatment of inflec- 
tion, and an idiosyncratic pronunciation scheme (Cowie 2009b: 393). 

Palmer investigated the grammatical patterning of words, in particular verb pat- 
terns. In 1938, he published his small but innovative dictionary A Grammar of English 


? The most thorough treatment of the development of MLDs (up to the turn of the millennium) is 
Cowie (1999). 
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Words. The dictionary won praise for its subtle verb-pattern scheme, arranging the pat- 
terns into numbered groupings (‘divisions’) of related types. The usefulness of Palmer’s 
dictionary was, however, limited by the fact that it focused on simple sentence pat- 
terns, whereas complex types involving subordinate clauses were not included (Cowie 
1999: 28). 

Hornby’s work focused on collocations and idioms, research that culminated in 
the publication of the first general-purpose advanced-level learners’ dictionary, the 
aforementioned Idiomatic and Syntactic English Dictionary (Hornby et al. 1942). After 
World War II, Oxford University Press became interested in Hornby’s dictionary and 
republished it in 1948 as A Learner's Dictionary of Current English, changed in 1952 to 
The Advanced Learner’s Dictionary of Current English (ALD1). The original use of the 
terms ‘idiomatic’ and ‘syntactic emphasized Hornby’s commitment to the produc- 
tive (i.e. encoding) function of the dictionary (cf. Cowie 2009b: 398). At the same time, 
Hornby succeeded in meeting learners’ receptive needs, adapting the vocabulary of the 
Concise Oxford Dictionary (3rd ed. 1934) as the basis for the headwords. ALD/'s treat- 
ment of word-combinations set the path for subsequent phraseological dictionaries, 
and its use of IPA symbols (to indicate RP pronunciations) became the model of choice 
in EFL lexicography. 

Hornby’s dictionary remained the unrivalled authority in the following three dec- 
ades. In 1963 the second edition (ALD2) was published, marking a shift of emphasis 
towards a broader coverage of scientific and technical terms, as well as a significantly 
greater number of examples, whereas verb-pattern schemes and the treatment of idi- 
oms and collocations remained unaltered (Cowie 2009b: 403). The third edition, then 
called Oxford Advanced Learner's Dictionary (OALD3, 1974), was strongly influenced 
by the grammatical research conducted by the Survey of English Usage (University 
College London), especially by its Grammar of Contemporary English (Quirket al. 1972). 
For instance, OALD3’s subdivision of verb patterns was rearranged according to the 
organization in the Grammar, and increased by a factor of almost three. However, the 
arbitrary letter~number codes used (e.g. [VP 6C]) failed to reflect the syntactic struc- 
ture of a given pattern, making reform of this feature necessary (Cowie 2009b: 409). 

The year 1978 saw the publication of the first real competitor to Hornby’s diction- 
ary, Paul Proctor’s Longman Dictionary of Contemporary English (LDOCE). LDOCE1 
had a great impact on EFL lexicography and provided various innovative features. 
It also drew on the grammatical resources made available by the Survey of English 
Usage and gave new impetus to West’s controlled defining vocabulary. The editors of 
LDOCE!: claimed that all definitions were compiled by means of a limited vocabulary 
of about 2,000 words, although the actual number of words and senses used was sig- 
nificantly higher, mainly due to polysemous defining terms and derivations that were 
additionally used. What made the dictionary perhaps most remarkable for its time was 
its highly systematic organization of grammatical categories and codes (Fontenelle 
2009: 414). However, mastering the intricacies of Longman’s coding scheme was often 
found by learners to be a daunting task, and subsequent editions of LDOCE adopted 
a simpler approach with fewer codes (cf. Fontenelle 2009: 417). LDOCE1 also deserves 
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credit for being the first large-scale computerized dictionary. of English, having been 
used in many natural language processing applications. |. 

The third big player on the EFL market, John Sinclair's Collins Cobuild English 
Language Dictionary (Cobuild), was first published in 1987. Although it initially 
received a lot of criticism, it was later widely imitated, especially as regards the use of 
corpus evidence (Moon 2009: 436). Cobuild can thus justly be regarded as the first 
corpus-based dictionary of English, applying an approach that is standard practice 
in present-day (EFL) lexicography. The corpus became the sole basis for the selec- 
tion of the example sentences in Cobuildi, which was emphasized by the eye-catching 
slogan on the front page: “Helping learners with real English. Sinclair addressed this 
issue in the preface to the dictionary, arguing that ‘[invented examples] give no reli- 
able guide to composition in English and would be very misleading if applied to that 
task’ (1987b: xv). This was a rather extreme position, not shared by many other lexi- 
cographers (cf. Bergh et al. 1998: 42). Cobuild1’s innovative use of full sentence defini- 
tions made the explanations easier to read and understand, but was criticized for being 
long-winded and repetitious (Herbst 1990: 1382). The dictionary was also distinguished 
by its physical appearance. A slim marginal column to the right of the dictionary text 
(the ‘extra-column) contained grammatical labels, synonyms, antonyms, as well as 
superordinates of senses and phrases, thus reducing non-discursive matter in the main 
text (cf. Moon 2009: 453). 

The same year, 1987, also saw the publication of LDOCE2, and OALD4 was published 
two years later in 1989. Both editions introduced a few changes but largely maintained 
their individual approaches (Moon 2009: 453). The year 1995 turned out to be one 
of the most prolific years for EFL lexicography. Within a period of six months, four 
MLDs appeared on the market: three new editions, OALDs, LDOCE3, and Cobuildz2, 
as well as an entirely new publication, the Cambridge International Dictionary of 
English (CIDE). In the tradition of Cobuildh, all four dictionaries were corpus-based, 
a selling factor that was strongly emphasized in the blurbs. LDOCE3 and Cobuild2 
were the first dictionaries to indicate frequency information, that is, data on the quan- 
titative distribution of lexical items within a given corpus. The Longman lexicogra- 
phers marked the 3,000 most common words of written and spoken English, while 
Cobuild2 originally covered a remarkably high frequency range of the 14,700 most 
common words.’ Another focus of the 1995 generation of MLDs was on improving the 
accessibility of the microstructure. To facilitate the finding of the individual senses of 
polysemous entries, Longman and Cambridge were the first to provide index terms or 
phrases, referred to as ‘signposts’ (LDOCE3) and ‘guide words’ (CIDE1). Remarkably, 
LDOCE3 almost entirely desisted from using codes for its syntactic patterns, instead 
spelling out this information in full, a method common in bilingual dictionaries 


3 Kilgarriff (1997a: 150) has argued that Cobuild2’s information on less frequent terms was not 
very reliable. Concerns about the reliability of the data are likely to have prompted Collins to limit its 
frequency indications to the 3,000 most common terms from the 2003 edition of Cobuild onwards. 
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(cf. Herbst 1996: 329). For example, a pattern of the verb ‘find’ was marked as ‘find sb 
doing sth’ rather than (T + obj + v-ing) or (V n -ing), which were the correspond- 
ing codes in CIDE: and Cobuildz. The most obvious advantage of this policy was that 
users hardly needed to remember syntactic codes, and many learners greatly appre- 
ciated Longman’s approach, which was maintained in later editions. As the title of 
CIDE! suggests, Cambridge put special emphasis on international varieties of English, 
covering ‘British, American, Australian and other usages, pronunciations, spellings 
and grammatical patterns’ (1995: viii). 

In retrospect, one might argue that by 1995 the English learners’ dictionary had wit- 
nessed its most profound innovations. Needless to say, subsequent print editions would 
keep introducing subtle changes, for example a more thorough treatment of colloca- 
tions and neologisms, a refined layout (e.g. the use of colours), an overall increase of 
headwords, accompanying CD-ROMs, etc.—but the key features of state-of-the-art 
learners’ dictionaries were essentially established by 1995.4 This does not, of course, 
mean that the publication of new dictionaries came to a halt. In 2002, a fifth com- 
petitor entered the market, the Macmillan English Dictionary for Advanced Learners 
(MEDAL)). Bogaards has rightly stated that MEDAL1 wisely adopted the features that 
had proven successful in past editions of learners’ dictionaries (Bogaards 2010: 21): clear 
guide word menus for the sake of accessibility, frequency information (on the 7,500 
most common words), and a refined defining vocabulary of 2,500 terms. The second 
edition of MEDAL appeared in 2007. 

It was only in 2008 that the first American learners’ dictionary was pub- 
lished, Merriam-Webster’s Advanced Learner’s English Dictionary (MWALED)). 
Unsurprisingly, there was a special focus on American English vocabulary and usage 
(cf. preface). MWALED: provided a significantly greater number of example sentences 
than its British competitors (around 160,000 according to Merriam Webster), but oth- 
erwise did not add any new elements to this type of dictionary (cf. Bogaards 2010: 25). 
This lack of innovation has recently led Bogaards to speculate whether the evolution of 
MLDs has come to its end (2010: 25). My answer to that question is a firm ‘no’, with the 
important addition that there is arguably more scope for improvement within the elec- 
tronic medium (cf. Section 3.4.2). 

The standard of today’s monolingual English learners’ dictionaries is cer- 
tainly admirably high, and learners can choose among the following current edi- 
tions: OALD8 (2010), LDOCE6 (2014), Cobuild8 (2014), CALD4 (2013),> MEDAL2 
(2007), and MWALED: (2008). The following section will! focus on the salient fea- 
tures of MLDs, at the same time trying to establish the criteria for successful learners’ 
dictionaries, 


4 This impression is confirmed by Yamada (2010: 150), who has pointed out that the 1995 generation 
of MLDs marked the beginning of a convergence within EFL lexicography. 

> Cambridge changed the title of CIDE to Cambridge Advanced Learner's Dictionary (CALD) from 
the second edition onwards. 
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3.3 SALIENT FEATURES OF LEARNERS 
DICTIONARIES 


The parameters of a ‘perfect’ learners’ dictionary have been widely discussed (cf. 
Herbst 1996) and are not undisputed, be it the question of authentic example sentences 
vs. invented ones, the coverage of grammatical topics, or the selection of the defining 
vocabulary. This section investigates the key features of monolingual English learners’ 
dictionaries, also tackling a few debated issues and areas in which there is still scope for 
development. 


3.3.1 Defining Vocabularies 


In deference to the learners’ reduced lexical proficiency, limited defining vocabularies 
of usually no more than 3,500 words are the basis for the compilation of the defini- 
tions in MLDs. This feature, first introduced in LDOCE1 in 1978, is an integral part of 
the success of the English learners’ dictionary. But while the advantages of defining 
vocabularies are evident, their disadvantages are sometimes overlooked. A defining 
vocabulary policy places a number of constraints on the lexicographers, not unusually 
resulting in ‘clumsy and unnaturally circumlocutory’ definitions (Carter 1994: 127). In 
many cases, it takes more space to define a headword using simple terms than using 
more complex and succinct language. Despite these drawbacks, the usefulness of a 
defining vocabulary is indisputable for most scholars. As Landau has stated, ‘there is 
no doubt that a controlled vocabulary makes for a duller text, and there is probably 
some merit to the charge of awkwardness, but the foreign learner may be better served 
by sacrificing all else to basic understandability of sense’ (Landau 1984: 343).° 

Defining vocabularies require a certain degree of complexity that allows lexicog- 
raphers to compile precise definitions and yet be simple enough to be understood by 
the user. The most frequent terms of a language are likely candidates for this purpose, 
as learners can be expected to be familiar with them. Unfortunately, frequency usu- 
ally goes hand in hand with polysemy and ambiguity (cf. Hartmann 1989: 184), which 
makes the selection of these terms even more problematic. All current MLDs occasion- 
ally depend on words from outside the defining vocabulary to improve the accuracy of 
the definitions. This is tolerable as long as these terms are marked or explained imme- 
diately. Oxford has taken an innovative approach in OALD8, keeping the definition 
phrases simple and additionally providing technical terms (usually in brackets). This 
ensures that the definitions are easily comprehensible while the learners’ vocabulary 
range is increased: 


§ This section is not retained in Landau (2001). 
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(1) cricket agame played on grass by two teams of 11 players. Players score points 
(called RUNs) by hitting the ball with a wooden bat and running 
between two sets of vertical wooden sticks, called stuMPs 


3.3.2 Definitions 


Definitions are, due to their importance, one of the most debated areas in MLDs, and 
perhaps the one most in need of improvement. The compilation of appropriate defini- 
tions is challenging for a number of reasons. There is, for example, the choice of the 
most suitable method of defining. Some headwords are best described analytically, 
whereas others are better defined by means of typifying or rule-based definitions 
(cf. Heuberger 2000: 15ff). In addition, the lexicographer needs to pay attention to a 
number of well-established principles of defining, for example the avoidance of circu- 
larity and the need for objectivity. 

Interestingly, there are a number of lexicographic shortcomings that seem to affect 
dictionaries for learners more than those for native speakers. Dictionaries for the lat- 
ter target audience generally present a more apposite description of the headwords, 
partly due to the greater length of their entries. In Heuberger (2000: 19ff), definitions 
of selected lexical fields in MLDs are analysed systematically, revealing five recurring 
shortcomings: irrelevance, ideology, inaccuracy, insufficiency, and incoherence. These 
points will be discussed briefly in the following. 

Printed learners’ dictionaries suffer from space limitations, so definitions need to 
be short. A definition can only provide few semantic features of a headword, which are 
meant to enable the learner to identify the lexeme and distinguish it from related ones. 
Learners’ dictionaries tend to waste space by giving irrelevant and redundant informa- 
tion, as the following example illustrates: 


(2) salmonn(C) a large fish with pink flesh. People sometimes fish for salmon with 
arod and line as asport. (OALDs) 


The information that salmon are fished for with a rod is not distinctive as this is true of 
a great variety of fish. It is certainly not a feature characteristic of salmon, nor does it 
contribute to a better understanding of the meaning of the term. 

While many users continue to endorse the notion of dictionaries as objective author- 
ities on language issues, linguists have identified ideology in dictionaries in some 
detail (e.g. Kachru and Kahane 1995). Landau (1984: 303)’ has argued that ‘dictionary 
definitions represent the views and prejudices of the established, well-educated, upper 


? This section is not retained in Landau (2001). 
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classes, generally speaking’. Anthropocentrism—the focus on nature’s utility or harm- 
fulness for humans—is just one example of such bias: 


(3) vulturen a large ugly bird with an almost featherless head and neck, which 
feeds on dead animals... (LIED1) 


By calling vultures ‘ugly’, the Longman Interactive English Dictionary® (1996) is very 
subjective and suggests a purely human-centred conception of beauty. The greater 
amount of anthropocentrism in learners’ dictionaries compared to native speaker dic- 
tionaries is presumably correlated to their principle of offer ing ‘simple’ explanations, as 
information on utility or harmfulness is usually easily comprehensible in terms of lexis 
and complexity (cf. Heuberger 2003). 

A comparison of selected definitions in dictionaries for learners and native speak- 
ers is also likely to show that the former are often less accurate. Again, it seems that the 
lexicographers’ efforts to provide simple explanations sometimes results in their being 
simplistic—concerns about accuracy certainly being less of an issue in MLDs: 


(4) desert alarge area ofland where it is always very hot and dry, and there isa lot of 
sand (LDOCEs) 


Nights in deserts can be quite cold, which renders the term ‘always’ inappropriate, and 
there are several types other than sand deserts. Even though incorrect facts like these 
are unlikely to prevent learners from understanding the definitions, such false ideas 
are perpetuated by continuing to include them in dictionaries. Therefore, compilers of 
learners’ dictionaries should pay more attention to accuracy. 

A smaller number of definitions in MLDs do not provide sufficient facts for the user 
to clearly identify the headword defined. The following example arguably allows for 
false interpretations on the part of the user: 


(5) afarm animal kept for its meat (CIDE1) 


CIDE?'s definition is supposed to illustrate the term pig, but it also applies to calves, 
bullocks, and even fowl. It is, admittedly, impossible to define all headwords unmistak- 
ably, but the referents usually have unique features which, if emphasized, make a clear 
identification possible. 

Svensé€n (1993: 125) has pointed out that a coherent defining style is more than just an 
aesthetic factor: it is ‘the only way to be consistent in the choice of the genus proximum, 
to choose the correct distinctive features, and to avoid redundancy, contradiction, etc.’. 


8 LIEDrisan electronic learners’ dictionary on CD-ROM, based on the Longman Dictionary of 
English Language and Culture (1993, ed. Della Summers). 
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Learners’ dictionaries sometimes read as if related lexemes have been defined indepen- 
dently of each other—often with the consequences predicted by Svensén: 


(6) gnatn a small fly that bites people and animals (MWALED)1) 


(7) mosquiton a small flying insect that bites the skin of people and animals and 
sucks their blood (MWALED1) 


MWALED?’s definitions for ‘gnat’ and ‘mosquito’ suggest that the two headwords have 
not been analysed and defined together; neither are they worded in a uniform manner 
(‘bites people’ vs. ‘bites the skin of people’), nor do they share the same genus term (‘fly’ 
vs. ‘insect’). 

In summary, definitions are among the information categories with the greatest 
scope for improvement. A significant number of definitions both fail to help the learner 
and to please the critic. Most of the shortcomings criticized could be overcome by 
adhering to a clearer defining policy, using componential analyses to identify distinc- 
tive features, and defining related terms together. 


3.3.3 Illustrations 


Pictorial illustrations, which are a standard feature in learners’ dictionaries nowadays, 
are mainly used to support definitions in explaining meaning. Apart from this primary 
function, illustrations are used to group and disambiguate words of a lexical field (e.g. 
tools), depict the parts and components of referents and contrast the various senses of 
polysemous and homonymous terms. Finally, (full-colour) illustrations are an impor- 
tant sales argument, as they make the dictionary more appealing to the eye. 

Like definitions, illustrations may succeed in depicting a headword well or may 
fail to do so. The selection of the features to be included is similar to the choice of 
semantic features to be provided by a definition. The two images in Figure 3.1 show 
some weaknesses typical of this information category in MLDs. Longman’s illustra- 
tion for the verb ‘shoot down’ is arguably ambiguous and would benefit from addi- 
tional features, for example a shooting enemy jet fighter, to disambiguate it from the 
more general concept of exploding. Similarly, the picture of a turtle fails to make 
clear whether this reptile is a land or a sea animal, as the turtle’s natural habitat, 
water, is not portrayed. 

Acomparison of the current MLDs shows that the publishers have often chosen dif- 
ferent terms to illustrate. Pictures should generally be employed more systematically 
for terms that are likely to cause confusion for the learner (e.g. due to their complexity 
or ambiguity), and the aesthetic value of illustrations ought not to surpass their lexico- 
graphic relevance. Longman already demonstrated in LDOCE3 that there is apparently 
no word class or type of word that cannot be illustrated effectively (cf. the innovative 
pictures dedicated to sounds and prepositions). Greater emphasis should be placed 
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shoot down 


en a 
AN 
FIGURE 3.1 Ambiguity of illustrations in LDOCE3 (‘shoot down’) and LDOCEs (‘turtle’) 


on thematic illustrations,’ which can show relationships that are difficult to describe 
verbally. 


3.3.4 Examples 


Example sentences fulfil several important functions in learners’ dictionaries, for both 
receptive and productive purposes; they help to clarify meanings (and thus support 
definitions) and serve as syntactic models to be followed by dictionary users. They are 
also a vehicle to indicate collocations, and—as a side effect—serve to counteract ‘the 
dry effect of an entire book of abstract analyses of words as words’ (Kipfer 1984: 78). 

The selection of example sentences is arguably one of the most controversial ques- 
tions in EFL lexicography. There are basically three approaches to this issue. A. S. 
Hornby, the pioneer ofthe modern learners’ dictionary, spoke out in favour of invented 
examples, mainly because they can include lexical and grammatical detail designed to 
help language learners (quoted in Cowie 1999: 134). As mentioned earlier, John Sinclair 
was among the sharpest critics of this position, arguing in the preface to Cobuild: 
that only authentic examples can show how a word is actually used, whereas invented 
examples ‘have no independent authority or reason for their existence . . .; usage cannot 
be invented, it can only be recorded’ (Sinclair 1987b: xv). The third approach, taken by 
all current MLDs, is actually a combination of the first two policies; the lexicographers 
rely on authentic example sentences, but usually edit them to remove difficult terms 
from the original corpus material and adapt them to reflect typical structures and col- 
locations. This approach combines the best elements of Hornby’s and Sinclair's posi- 
tions and seems to be the most promising from the viewpoint of the learner.!” 

Even though example sentences have an important semantic function in MLDs, 
dictionary makers have traditionally decided against applying defining vocabularies 


? OALD8 and CALD3 can serve as examples in this respect; all illustrations portray groups of 
semantically related lexemes or the parts and components of terms. 
1 Even Collins has departed from Sinclair's original approach, as stated in Cobuild 
(2008: xi): ‘Examples themselves remain close to the corpus, with minor changes made so that they are 
more successful as dictionary examples’. 
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to this information category. The decision whether less frequent terms should remain 
within example sentences is largely based on lexicographers intuition. It is not unusual 
for unsuitable terms to slip through. The following examples taken from three recent 
editions of learners’ dictionaries illustrate this point: 


(8) omnipresent So how did a diminutive Australian soap star get to be an omnipres- 
ent icon of style and beauty? (CALD3) 


(9) trend Everybody seems to be following the trend for sleek shiny hairstyles. 
(MEDAL2) 
(10) house I spent the weekend just puttering around the house. (MWALED}) 


Learners’ knowledge of terms like ‘diminutive’, ‘sleek’, or ‘puttering’ ought not to be 
taken for granted. In including example sentences, the frequency of the terms they use 
should be considered, a task which again needs to be corpus-based. In keeping with 
Bogaards, at least those examples that are intended to support the definitions (rather 
than to illustrate syntactic patterns) should be kept correspondingly simple (Bogaards 
1996: 298). Besides, it has been argued that lexicographers should pay greater attention 
to the encoding function of examples, especially with regard to Jess frequent terms (Xu 
2008: 410). 


3.3.5 Grammar 


The treatment of grammatical information is another area where dictionaries for learn- 
ers and native speakers differ significantly. Unlike native speaker dictionaries, MLDs 
not only indicate the part of speech of a headword, but they also give detailed informa- 
tion on syntactic patterns and irregular inflections. In particular, learners’ dictionaries 
focus on the syntactic behaviour of verbs, ‘since verb syntax is essentially the syntax of 
the clause’ (Jackson 1994: 180). Unfortunately, learners often fail to make full use of this 
information category, mainly because they only reluctantly read the prefatory matter 
and are not prepared to study the coding systems adequately. 

Grammatical information is difficult to provide in a learners’ dictionary, for at least 
three reasons (cf. Stark 1990: 22). First, there is the problem of arranging the data, as 
grammatical information does not lend itself to an alphabetical ordering in the dic- 
tionary.” Second, lexicographers are supposed to avoid redundancy by not repeating 
the information generally found in grammar books,” and third—and perhaps most 
importantly—the grammar codes need to be clear and effective. Ideally, the codes 


4 Cf. Collins’ attempt to indicate grammar within the so-called extra column in earlier editions of 
Cobuild. 

2 This point is a debatable one; it has even been suggested that learners should be provided with all 
the grammatical guidance they need without being referred to grammars at all (cf. Jackson 1985: 53) 
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weal 84% J yront / fwanls, wanting, wanted) 

VERS (ne cont, no passive} Ifyou want sometning. you Tew a Gesire or a teed for 
Bived tial ee 

Liv 1) tan drowns exactly wits fe seats a7 ite 

Dev tote Pemple eeanted to know who thus takeabed designer was 

DAV n fount They degan 20 want thea father to be the same as other cedcees. 


(V6 eng: They didn? wart people staring al trem an tney sat on the Len, 50 they cut up hag 
wads 


OAV ped! Ne wanted fis power mcognasnt 
Oey an) tment my cov Bes comer 


Deva acpprent And rememites we dant Mum nove 


2 ftranattyve] used to say that you need something os to ask someone 

firmly to do something for you: 

Do you snili want these magazines, or can I throw them out? 
want something dome 

J want that ierter typed today. 
Waut somebody to do something 

I want yov 10 find aut what they're planning, 

Do you want me 10 pick vou up at the airport? 
moke you want to crythrow ep etc (=give you a strong feeling that you 
must do something) 

Zt always makes me want io sneeze. 

What da you want with o 100! kit (=what do you need it for)? 

want doing Brinsh Enghsh mfarmal (maced to be done) 

The carpet really wanis cleaning. 


FIGURE 3.2 Excerpts from (respectively) Cobuild6 and LDOCEs illustrating the use of pattern 
codes 


should have a mnemonic value which enables the learner to interpret and remember 
them more easily. 

Coding systems are among the features where the dictionaries on the market are 
most distinct, and arguably one in which the greatest improvements have been made 
in the past few decades. As already mentioned, former editions of learners’ diction- 
aries often included grammar codes that were not suggestive of the syntactic pattern 
itself. The occasional dictionary user in particular was likely to forget those inscrutable 
notations after not having consulted the reference work for some time. Many of today’s 
learners’ dictionaries spell out grammatical information in full and are thus much 
more comprehensible and user-friendly in this respect. 

The current generation of MLDs continues to cater to the diverse grammati- 
cal needs of its users, though the differences between the individual reference works 
have become noticeably smaller. Learners who require detailed and explicit gram- 
matical information have traditionally been well-served with a Cobuild dictionary. 
Interestingly, the complexity and range of Cobuild6 (2008) has been greatly reduced 
compared to Cobuild2 (1995). Of the seventy-four(!) grammatical patterns included in 
the second edition, only thirty-six have remained in Cobuild6, For that edition, Collins 
also decided to remove its long-established extra column, which was used to indicate 
grammatical information in previous editions.'? Longman, on the other hand, has long 
focused on the development of an easy-to-understand coding system, spelling out syn- 
tactic patterns in full since its 1995 edition. The excerpts in Figure 3.2 taken from recent 
Cobuild and Longman dictionaries illustrate these different approaches with regard to 
pattern codes, 

Longman’s policy of spelling out pattern codes has been adopted by its competitors 
OALD8 and MEDAL2, whereas CALD4 and MWALED:i continue to include them. 
Clearly, both approaches have their justification, and the choice of the most suit- 
able dictionary with regard to grammar depends on learners’ individual needs and 
preferences. 


3 Studies have shown that learners hardly used the extra column asa source of grammatical 
information (Bogaards and van der Kloot 2001: 118). 
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3.3.6 Usage Information 


Example sentences and grammar codes are designed to indicate ‘how a word can 
be used, but not how it can’t’ (Whitcut 1985: 77). This is one of the main functions of 
so-called usage notes, first included in LDOCEz. Primarily intended for encoding, they 
provide additional grammatical information, highlight subtle differences in meaning, 
or focus on pragmatic aspects.'* All MLDs currently on the market provide helpful 
and accurate guidance within this information category. The usefulness of this feature 
is mainly limited by the fact that the notes occur rather irregularly and sporadically. 
A given set of words is usually only discussed in one of the dictionaries available. 

Usage information is also indicated by means of labels. They are employed to give 
information on such factors as currency and temporality (‘obsolete’), frequency of use 
(‘rare’), regional variation (‘Australian’), technical or specialized terminology (‘astron- 
omy’), restricted or socially unacceptable usage (‘taboo’), style (‘informal’), and sta- 
tus (‘non-standard’) (cf. Landau 2001: 217-18). The criticism voiced in connection with 
usage notes also applies to labels: their main weakness is that they are often employed 
rather inconsistently, that is, not for all headwords to which they are pertinent. 


3.3.7 Collocations 


Collocations constitute a major challenge for the language learner as they are often 
unpredictable. They also present difficulties for the dictionary maker, as they entail a 
number of methodological problems. Apart from choosing suitable candidates, lexi- 
cographers have to decide how and where to indicate collocations. In principle, they 
can be entered under the collocational base (i.e. the semantically more autonomous 
word) or the collocator (the semantically more dependent element). Space restric- 
tions in print dictionaries usually prevent lexicographers from providing collocations 
within the entries of both their components. In addition, a suitable vehicle needs to be 
chosen; collocations are sometimes listed separately within an entry, but more often 
they are incorporated into examples and definitions or discussed within usage notes. 
Dictionary users often cannot be sure where to find collocations; a universal format, be 
it with regard to placement or typography, has yet to be realized. 

Every learners’ dictionary has its individual approach to this problematic area. 
Cobuild6é and LDOCEs, for example, give collocations separate status in the microstruc- 
ture, listing (and, if necessary, explaining) them in a self-contained box (Figure 3.3). 
Thus, users can locate the data immediately without looking through the entire entry. 


4 For a recent discussion of pragmatic information in learners’ dictionaries, see Yang (2007). 

15 Walker (2009: 298) has noted in this context that lexicographers’ awareness regarding the nature 
of collocations ought to be raised: ‘In fact, it can be argued that some of [the] limitations may result 
from a lack of clarity concerning what exactly constitutes a collocation, and a lack of understanding of 
the linguistic features and processes which influence the process by which collocations are formed’. 
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harm fan: countable] 
1 damage, injury, or trouble caused by someone's actions or by an event 


do harm (to somethiagydo something harm 

caase (somebody/something) harm 

seffer harm 

do more harm than good (=cause more problems rather than smproving the 
itvation) 


[y] Word partnership 


where's the harm in that? spoken (=ased when you think that something 
seems reasonable, although other people may not) 

no harm dowe spoken (=used to tel someone not to worry about something 
they have done) 


FIGURE 3.3 Collocation boxes in Cobuild6 (‘key’) and LDOCEs (‘harm’) 


This also allows for shorter and clearer definitions and examples, as lexicographers do not 
need to integrate the collocational information there. It should be pointed out, however, 
that only headwords with a broad collocational range are equipped with this feature. 


3.3.8 Pronunciation 


All current MLDs provide phonetic transcriptions by means of the International Phonetic 
Alphabet (IPA), a tradition which dates back to ALD1, Native-speaker dictionaries some- 
times rely on their individual notation systems which are based on respelling, compelling 
users to learn every system anew if they consult dictionaries by different publishers. The 
claim that the IPA is too difficult for the average user (cf. Kipfer 1984: 120) is hardly justi- 
fied, and most foreign learners of English become familiar with it in the course of their lan- 
guage studies anyway. With regard to phonetic transcription systems, the current MLDs 
hardly differ from each other. All dictionaries except for MEDAL2 indicate American 
English alternative pronunciations, and MWALED3, of course, focuses on this variety. 


3.3.9 Accessibility 


A well-designed access structure is arguably a prerequisite for the effective use of any 
reference work. Current MLDs are intended for both decoding and encoding, which 
poses conflicting demands on lexicographers: “Whereas for receptive purposes the 
learner should be guided from unknown elements to familiar ones, for productive 
goals he should be able to start from familiar words in order to find words which are 
new to him’ (Bogaards 1996: 280). Dictionaries intended for encoding should ideally 
be arranged semantically rather than alphabetically, since an alphabetical order- 
ing blurs the meaning relation between headwords,’® However, MLDs are organized 


© Compare the arrangement in the Longman Language Activator (LLA2), a dictionary designed 
especially for production. 
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absorb fobza:b) @ /-za:rb; verb [T} TAKE INO T EP to ae 
take something in, especially gradually: Plants absorb gentie - definition x 


carbon dioxide. |: In cold climates, houses need to have Snow 
walls that will absorb heat. Towels absorb moisture. ADJECTIVE @ 
c The drug is quickly absorbed Into the bloodsiteam. 
& Our cauntryside is increasingly being absorbed by/ 
into the large cities. 210 reduce the effect of a physical 
force, shock or change: The barrier absarbed the main Menu 
impact of the crash. UNDERSTANDO 3 to understand 4 
facts or ideas completely and remember them: it's 
hard ta absorb so much information. 1NTEREST VERY 2 
MUCHO 4 If someone's work, or a book, film, etc. 3 : 
absorbs them, of they ure absurbed in it. cheir attention . wind iran. nol strong 
4 
5 


L} fdzent(a)l/ | word Forme | 


is given completely to it: Simon was so absorbed In his 
book, he didn't even notice me come in. QSee also self- 
absorbed 


FIGURE 3.4 ‘Signposts’ in CALD3 (‘absorb’) and ‘menu words’ in MEDAL2 (‘gentle’) 


alphabetically, meaning that dictionary makers need to provide tools that compensate 
for this policy. For instance, cross-references are supposed to enable the learner to find 
unknown words starting from familiar ones. Example sentences, collocations, illus- 
trations, and a few other information categories indirectly fulfil this purpose as well. 
However, language production with only a monolingual dictionary at hand remains a 
time-consuming and laborious procedure, and the translation of complex texts with- 
out the use of bilingual dictionaries remains illusory. 

A more thorough discussion of the access structure of learners’ dictionaries is 
beyond the scope of this paper. Merely one feature designed to facilitate the location 
of senses of polysemous terms wil] be mentioned here in brief. The introduction of 
‘signposts’ and ‘guide words’ (before each meaning) in the 1995 generation of learners’ 
dictionaries (i.e. LDOCE3 and CIDE) has proven to be a seminal innovation that may 
facilitate considerably the retrieval of word senses.” MEDAL/’s use of ‘menu words (at 
the top of the entry) has a very similar function. Unfortunately, only the aforemen- 
tioned three publishers currently rely on this feature. The examples in Figure 3.4 are 
taken from CALD3 and MEDAL2. 


3.3.10 Front and Back Matter 


User research has shown that the front matter of MLDs is hardly ever read (Herbst 
1996: 339), mainly because it is either perceived as too long or not attractive enough in 
terms of layout and style. Besides, as Kirkpatrick (1989: 754) has suggested, ‘it is widely 
believed that one dictionary is much like another’. Asa result, many learners remain 
ignorant of some (innovative) features of their dictionaries and are likely not to use 
them to their full potential. The current generation of learners’ dictionaries provides 
‘visual keys’ to their reference works, that is, annotated sample pages of the dictionary 
in combination with verbal descriptions. This keeps the guides shorter than traditional 
discursive descriptions and at the same time makes them more appealing to the eye. 


” Nesi and Tan (2011: 90) have recently evaluated this feature largely positively. 
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Similarly, many learners are unaware of the extra information their dictionaries 
contain in the back matter (cf. Jackson 1994: 38). Publishers continue to provide this 
section in their current learners’ dictionaries, although there seems to be little consen- 
sus on what features to include. Arguably, a list of irregular verbs is a useful appendix 
in any MLD. Similarly, learners ought to be informed about which words they need to 
know when using their dictionary (i.e. a defining vocabulary list). A concise treatise on 
essay writing as well as formal and informal letters (including emails) is another help- 
ful contribution in making the reference books more suitable for productive purposes. 
Finally, there area number of ‘traditional’ appendices which may not be consulted very 
often but which should prove helpful on some occasions: lists of geographical names, 
prefixes and suffixes, numbers, weights, and guidance on punctuation. 


3.4 SPECIAL TYPES OF LEARNERS’ 
DICTIONARIES 


3.4.1 Bilingualized Learners’ Dictionaries 


The bilingualized English learners’ dictionary (BLD) is a comparatively recent devel- 
opment which aims to combine the central or “best’ elements of monolingual and 
bilingual reference works (cf. Béjoint 1994: 73). Such dictionaries usually have an entry 
structure very similar to MLDs (i.e. definitions and examples compiled in English), 
but additionally provide a translation for every sense into the learner's native language. 
This means, of course, that bilingualized learners’ dictionaries need to be compiled for 
every given foreign language, which is one of the reasons why only a few BLDs have 
been published to date." 

Laufer and Hadar (1997: 192ff) have pointed out that these hybrid dictionaries have 
proven effective in several studies, although the results vary with the skills of the user 
group. In particular, less advanced learners have achieved better results with bilingual- 
ized reference works than with monolingual or bilingual ones. But advanced learners 
are also likely to profit from the translations, as they reassure and reinforce the learn- 
er’s decision about the meaning and use ofa given lexeme (Laufer and Hadar 1997: 195). 
The only problem with BLDs from a didactic point of view is that their users tend to 
skip the monolingual parts, often going straight to the translations and thus missing 
exposure to the L2 (Pujol et al. 2006: 203f).7 


18 While MLDs can be sold worldwide, there is obviously only a restricted market for bilingualized 
learners’ dictionaries. They are, for example, very popular with Chinese-speaking EFL learners (Chen 
2011: 161), whereas other languages like German still await the publication of the first BLD. 

‘Deferred’ bilingualized dictionaries try to counteract these habits by not presenting the 
translation immediately (cf. Pujol etal. 2006). 
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The electronic medium lends itself particularly well to the realization of BLDs, not 
only because translations increase the size of print dictionaries. Printed BLDs are often 
structured on the basis of one L2 language, for example English.2° This means that 
translations, which are provided for each sense, cannot be taken as a starting point, 
making these dictionaries more useful for decoding than encoding. However, users of 
computerized reference works can search for both the English term and its native lan- 
guage equivalent and are thus able to start from either language. This is an enormous 
advantage, and a well-compiled bilingualized EFL dictionary in electronic form can 
therefore combine the best features of monolingual and bilingual dictionaries without 
sacrificing any of their individual strengths. 


3.4.2 Electronic Learners’ Dictionaries 


The basic outline of the salient features of MLDs presented in Section 3.3 applies to both 
printed and electronic reference works. Even the most recent versions of electronic 
learners’ dictionaries are essentially conversions of existing print dictionaries, mean- 
ing that their main differences do not concern the entry texts but rather additional fea- 
tures enabled by technology (cf. de Schryver 2003: 146). All six big publishers of MLDs 
offer free online versions of their current reference works,”! sometimes restricted in 
function (e.g. CobuildO). Electronic versions of learners’ dictionaries are also avail- 
able on optical data carriers (CD-ROMs or DVDs) that are often sold in combination 
with the corresponding print dictionaries. They can usually be installed on local hard 
disks, which minimizes access times. The recently published Cobuild7 is the first MLD 
to come with a mobile app for the iPhone, Android, and mobile web browsers. In Asia 
especially, handheld mobile devices are also very popular (cf. Nesi 2009: 460ff). 

Electronic MLDs have several significant advantages, the most important of which 
are briefly discussed here. To begin with, limitations of space are much less of an issue 
than in print dictionaries. As already pointed out, however, this potential has up to 
now hardly ever been used for the entry text itself. Instead, electronic learners’ dic- 
tionaries provide extra features such as (visual) thesauruses,”” exercises, vocabulary 
builders, language games, etc. Among the greatest benefits of the electronic medium is 
the provision of what has often been termed ‘multimedia features’. The integration of 
audio (and sometimes video) material allows for a more natural and vivid description 
of the language than a book could ever achieve. In particular, recorded pronunciations 
by native speakers are a highly useful feature that modern MLDs often provide for both 
British and American English. 


20 A space-consuming alternative is to organize the hybrid dictionary the way some bilingual ones 
are structured, with half of the reference work dedicated to each of the two languages in question. 

21 These online dictionaries (including their URLs) are all listed within the references. 

22 Cf. CALDO, 
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Another major advantage of electronic dictionaries (mainly on optical storage 
devices) is that they can include self-contained corpora. The potential usefulness of 
such corpora for receptive and even more so for productive purposes ought not to be 
underestimated. Even for less frequent terms, learners are usually provided with a size- 
able number of examples that can serve as syntactic models for correct usage or as the 
basis for lexical analyses. Besides, these corpora are believed to play an important role 
in learning, as they may foster learners’ interest in the language and its structures. 

Thanks to sophisticated retrieval systems, access times may be reduced”? signifi- 
cantly in the case of electronic MLDs. The search process is also facilitated and refined 
by features such as Boolean operators, wildcard searches, hypertext searches (instant 
cross-references), filter searches, etc. Finally, the user-friendliness of electronic learn- 
ers’ dictionaries is often improved by allowing for various customizations (e.g. with 
regard to typeface, layout, etc.) and by their large-scale linkability with other soft- 
ware (e.g, word processing tools). A more detailed investigation of electronic MLDs is 
beyond the scope of this paper. For a more thorough discussion of electronic (learners’) 
dictionaries, see, for example, Nesi (2009) and de Schryver (2003). 


3.5 CONCLUSION AND OUTLOOK 


The history of the English learners’ dictionary is indeed a success story. More than 
seventy years after the publication of Hornby’s ISED, the genre is alive and kicking, 
with new publications appearing regularly and sales running into tens of millions. In 
addition, EFL lexicography has contributed fruitfully to dictionary making in gen- 
eral, as many important innovations of recent decades have emerged from this corner 
of the field. Despite this praise, EFL lexicography cannot afford to rest on its laurels 
and is likely to witness some profound changes within the foreseeable future. Many 
of these changes and innovations will be driven by technological developments, for 
example the growing popularity of smartphones. Although printed learners’ diction- 
aries will—despite all former prophecies of doom—continue to be published in the 
years and decades to come, the scope for development is arguably greater as regards 
the electronic medium. Learners are more and more used to obtaining their informa- 
tion online or through mobile devices, preferably free of charge. The majority of EFL 
publishers already make their current learners’ dictionaries available for free, and this 
business model (financed by advertisements) is likely to keep flourishing. 

Rundell (2010) predicts that another major change might concern the well- 
established format of the MLD, which needs to adapt more flexibly to the needs and 
demands of its target audience: 


23 However, the short time needed to retrieve information may prove detrimental to its retention 
(cf. Dziemianko 2010: 258). 
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The one-size-fits-all model was a boon for publishers: the same ALD could be sold 
all over the world to students whose needs, capabilities and cultural backgrounds 
were in reality quite diverse. But this approach is out of step with the Zeitgeist and 
is starting to break down. The demand among consumers generally is for products 
that match their individual needs more precisely—an expectation that is already 
transforming businesses like television and popular music. In dictionary terms, 
this implies both customization and personalization. (Rundell 2010: 17:f) 


These plausible demands for adaptation can certainly be realized more efficiently 
within the electronic medium, where individual access to dedicated corpora, colloca- 
tion banks, L2 translation systems, etc. is technically feasible. ‘Personalization’ also 
implies that the dictionary should ideally match the user’s changing needs, for example 
by calibrating the level of Li support (Rundell 2010: 172). 

A large-scale electronic learners’ dictionary in its own right, originally conceived for 
the electronic environment and continuously kept up-to-date, is yet to be published. 
The same is true of visionary features such as supplementary multimedia corpora for 
learners, combining searchable text, audio, and video material cross-referenced with 
each other (cf. de Schryver 2003: 169). Currently, such ideas are still the dreams of 
lexicographers, although not completely illusory ones. They illustrate that the poten- 
tial of MLDs is yet to be exhausted, and make us curious about what lies ahead. These 
desiderata as well as the critical comments voiced in Section 3.3 should not blur the fact 
that the standard of state-of-the-art monolingual dictionaries for foreign learners of 
English is already enviably high, arguably unrivalled by any other language. It is safe to 
predict that EFL lexicography will maintain its supremacy and continue to be innova- 
tive in the years and decades to come. 
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4.1 HISTORY 


Boisson (1996) and Boisson et al. (1991) describe in great detail the history of bilin- 
gual lexicography. Contrary to what is frequently assumed, bilingual dictionar- 
ies came later than monolingual dictionaries and the lexicographic traditions in 
Mesopotamia and in Ancient Egypt were clearly originally monolingual because 
the great civilizations in the Middle East were, as is pointed out by Van Sterkenburg 
(2003b: 8-9), self-centred and did not focus on their neighbouring countries. The 
Babylonians then created dictionaries that met practical needs: their lists of words 
were carved on clay tablets in order to make the Sumerian language more easily 
accessible. In Europe, historians generally agree that the first dictionaries can be 
traced back to the explanations of difficult words inserted into Latin manuscripts in 
the Middle Ages. They fulfilled a vital function in teaching and in the transmission 
of knowledge. Van Sterkenburg (2003b) clearly shows that religion in Europe sig- 
nificantly contributed to the development of lexicography, given that it was essential 
to create pedagogical tools to find solutions to the meaning of Latin words in reli- 
gious texts. Glosses, or explanations provided in the vernacular language for pas- 
sages in the Bible that were hard to understand, began to be written in the margins 
or between the lines of medieval texts. These marginal or interlinear glosses were 
then subsequently regrouped, alphabetically or thematically, into collections of 
glosses, or glossae collectae (see Van Sterkenburg 2003b: 9-10). These explanations of 
hard words in Old English or Old French can be seen as precursors of modern bilin- 
gual dictionaries. Béjoint (this volume, especially section 2.2) provides more details 
about the history of dictionaries for general users (see also Considine, Chronology, 
this volume). 
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Four major functions are generally assigned to bilingual dictionaries, depending on 
whether the user is using the dictionary to understand or translate a text written in the 
foreign language (Lz) or in the first language (L1): 


1. Reception in L2 

2. Reception in L2 + production in Li 
3. Production in L2 

4. Reception in Li + production in La 


The ‘reception-oriented’ dictionaries are also often called ‘passive dictionaries’, since 
they are designed to facilitate the understanding process in a decoding perspective. 
‘Production-oriented’ dictionaries are also called ‘active’ dictionaries, given that the 
focus is on help provided to express an idea, in an encoding perspective. 

Most of the dictionaries discussed later in this article are what Adamska-Sataciak calls 
‘bidirectional bilingual dictionaries’ (Adamska-Salaciak, this volume), insofar as they 
serve the needs of both language communities at once, the same dictionary acting as an 
L2-L1 dictionary for one group of users andas an L1-L2 dictionary for the other group of 
users. There are also monodirectional bilingual dictionaries, however, which are aimed 
exclusively at native speakers of the Lior the Lz. Many such monodirectiona! dictionaries 
will be La—-L1 dictionaries, which users consult more readily when they are confronted 
with an L2 word or expression that they would like to understand. The Dutch-French 
dictionary published jointly by Van Dale and Le Robert for French speakers is a case in 
point (Le Robert and Van Dale Dictionnaire néerlandais-frangais, 1988). 

Lexicographers and linguists such as Hanks (2000) and Kilgarriff (1997b) have 
examined the crucial issue of word senses and polysemy. They both arrived at the same 
conclusion that word meanings do not exist as a checklist, ifthey exist at all. Kilgarriff, 
for instance, demonstrates that the occurrence of a word in context, in the form of cita- 
tions or key-word-in-context (K WIC) concordances, provides a more operational defi- 
nition than the concept of ‘word sense’, which he argues is not a workable one. Word 
senses for him are abstractions over clusters of word usages, and the lexicographer’s 
task is to examine large numbers of concordances, divide them into clusters and work 
out the elements which explain why the members of a cluster belong together. 

For practical purposes, it is clear that dictionaries cannot provide their readers with 
concordances grouped into clusters (this is more conceivable in electronic products, but 
users cannot be expected to be turned into lexicographers and to ‘make sense’ of hun- 
dreds of KWIC lines). Dictionaries are therefore based on an oversimplification which 
posits that words have enumerable, listable meanings that are divisible into discrete units. 
Such constructs come in handy because dictionary users tend to work best with clear-cut 
distinctions and categories that are classified into distinct, well-organized boxes. One of 


46 THIERRY FONTENELLE 


the key questions the lexicographer then faces is related to the distinction between lump- 
ing and splitting. The former term refers to the slightly different patterns of usage that are 
considered as a single meaning, while the latter happens when the lexicographer sepa- 
rates slightly different patterns of usage into distinct meanings. The burning question 
whether the lexicographer should apply a lumping or a splitting strategy does not just 
apply to monolingual dictionaries, however. A related question for bilingual lexicogra- 
phers is whether sense divisions should be based upon the source language or the target 
language. Indeed, many bilingual dictionaries divide the semantic space of source lexical 
items as a function of the target language. This has a profound implication for bilingual 
lexicography, since a word which is regarded as monosemous in a monolingual diction- 
ary may be considered as polysemous in a bilingual dictionary because the target lan- 
guage makes distinctions which are non-existent in the source language. 

Consider the entry for bone in the Macmillan English Dictionary for Advanced 
Learners (second edition, 2007): 


bone 1. [countable] one of the hard parts that form a frame inside the body of a 
human or animal. This frame is called a skeleton. 


She fell and broke a bone in her foot. 
Cook the fish, then carefully remove the bones. 
He was thin, and his bones stuck out. 
2. [uncountable] The substance that bones are made of. 
The archaeologists found fragments of bone. 
2a. [only before noun] made of bone 


A chess set with carved bone pieces 


The basic distinction in this monolingual dictionary is between the countable and the 
uncountable uses of the word bone. In comparison, a bilingual dictionary such as the 
Oxford-Hachette French Dictionary (Corréard and Grundy 1994— OH 1994) makes dis- 
tinctions that are based upon the existence of different potential translations: 


bone I #1 (of human, animal) os m; (of fish) aréte f, made of ~ en os chicken on/off the 
~poulet a l’os/désossé; to break a ~casser un os; to break every ~ in one’s body se rom- 
pre lesos(...) 


The first sense in the bilingual dictionary is subdivided into two subsenses correspond- 
ing to the various meronymic (part-whole) relations: the default translation for bone 
will be os when a bone is part of a human or of any animal. When talking about a fish, 
however, bone will translate as aréte. Note that the use of animal in the metalinguistic 
bracketed material may be confusing, since a fish is also an animal. It would therefore 
probably be better to provide an even more synthetic treatment, as in the following 
entry from the Collins Robert French Dictionary (Atkins and Duval 1978, 2nd edition, 
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1987, henceforth CRFD2), which clearly provides the default translation first, followed 
by the specific exception, which uses the metalinguistic indicator [fish] to guide the 
user: 


bone 17a os m; [fish] aréte f 


The concepts of monosemy and polysemy are treated very differently in monolingual 
and bilingual dictionaries. The examples above illustrate cases where a word is per- 
ceived as monosemous by a native speaker, in which case the monolingual dictionary 
provides only one definition, while the bilingual dictionary makes use of metalinguis- 
tic elements to distinguish non-interchangeable translations. Duval (1986: 96) shows 
that the reverse situation is also possible, that is, a word that is perceived as polysemous 
by native speakers may be treated as entirely monosemous in a bilingual dictionary if 
the target language equivalent accounts for all the possible uses of the source word in 
all contexts. Duval uses the English word tarsus as a case in point, a word which would 
require three distinct senses in a monolingual dictionary: 


tarsus Noun 


a. (Anatomy) the bones of the ankle 
b. (Anatomy) a part of an eyelid 
c. (Zoology) a part of an insect’s leg 


Because the French equivalent tarse has exactly the same distinctions (The Encarta 
French Dictionary gives the following three senses: Anatomie: partie postérieure du 
squelette du pied constituée par une triple rangée de sept os courts; Zoologie: en ento- 
mologie, dernier segment constituant la patte d’un insecte, composée de plusieurs 
segments; Anatomie: lame fibreuse formant le cartilage de la paupiére, lui donnant sa 
forme et sa rigidité), a bilingual dictionary entry could be as simple as the following 
one: 


(English-French) tarsus tarse masc 


(French-English) tarse n masc tarsus 


4.3 METALANGUAGE AND THE BILINGUAL 
DICTIONARY 


Metalanguage is used by the lexicographer to talk about words. As is pointed out by 
Duval (1986: 98), bilingual dictionaries at the beginning of the twentieth century were 
not very different from monolingual dictionaries and tended to focus on the reception 
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approach. The metalinguistic infrastructure was virtually non-existent and the disam- 
biguation of non-interchangeable translations was done via semi-colons, while equiv- 
alent, interchangeable translations would be separated by commas. The goal for the 
lexicographer was to provide as many equivalents as possible in order to allow the user 
to figure out the meaning of the source language item. No efficient tool was made avail- 
able to the user to disambiguate all these equivalents, however. This started to change 
in the second half of the twentieth century, when the user-as-a-producer became the 
focus and bilingual dictionaries started making more systematic use of metalanguage 
in order to guide users and help them differentiate the various target language equiva- 
lents. Ideographic elements were even introduced to allow source and target language 
users to refer to the concept via symbols that are not tied to a specific language. The use 
of asterisks as indicators of register used to mark non-neutral words and expressions is 
a case in point, asin the Collins Robert French Dictionary (CRFD2): 


+ * Indicates that the expression, while not forming part of the standard language, 
is used by all educated speakers in a relaxed situation but would not be used ina 
formal essay or letter, or on an occasion when the speaker wishes to impress (he 
laughed himself silly* Il a ri comme un bossu* or comme une baleine*) 

+ ** indicates that the expression is used by some but not all educated speakers in 
a very relaxed situation. Such words should be handled with extreme care by the 
non-native speaker unless he is very fluent in the language and is very sure of his 
company (what the hell is he doing?** Quwest-ce qu’il peut bien fabriquer?* or 
foutre?**) 

+ ".” means ‘Danger’! Such words are liable to offend in any situation, and therefore 
are to be avoided by the non-native speaker. (baiser ... vt b (.’) to screw’. , lay’, 
fuck’.’) 


Other dictionaries may use different strategies, with different types of ideographic 
elements (the Oxford-Hachette French Dictionary—OH 1994—uses a system of white, 
black, and semi-black circles to tag words that are informal © or vulgar/taboo @). They 
may also opt for abbreviations such as fml, infml, vulg., etc. 

Metalinguistic information covers many aspects of a word’s use and not just register 
information. It traditionally includes: 


partof speech (n, v, adj, adv, prep...) 

countability (e.g. c for countable noun and ¢ for uncountable noun) 

grammatical information (transitivity information, number and gender: e.g. vt for 
transitive verb; vi for intransitive verb; nf for feminine noun, mpl for masculine 
plural noun, inv for invariable...) 

style labels (fml for formal, inffor informal...) 

* regional variants (US for American English, Brit for British, Belg for Belgicism. ..) 
domain (Med for medicine, Geog for geography, Astron for astronomy...) 
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The following entry, excerpted from the Collins Robert French Dictionary, illustrates 
the systematic use of such metalinguistic labels which can be used to guide the reader 
and help distinguish the various senses of the French word carte (the label gén. is used 
for the default, primary translation): 


carteinf (a) (gén) card. ~ (postale) (post)card; ~ de visite visiting card, calling card 
(US). 
(b) (Jeux) ~ a jouer playing card; 
(c) (Géog) map; (Astron, Mét, Naut) chart. ~ du relief/géologique relief/ 
geological map; ~ routiére roadmap; ~ du ciel skychart; ~ de la lune 
chart ou map of the moon 


(d) (au restaurant) menu 


Note the use of the swung dash (~), a space-saving device which spares the lexicogra- 
pher the trouble of repeating the headword many times in the entry. 

The entry above clearly shows the advantage of metalinguistic labels to distinguish 
translations that are not interchangeable. Pocket dictionaries, which, for space reasons, 
cannot make use of such a sophisticated system of labels, would only provide a list of pos- 
sible translations, usually separated by commas, which would not do its users any service, 
at least ifthey are L1-French users who wantto use this entry for L2-encoding purposes: 


carte nf card, map, chart, menu 


4.4 LEXICAL COLLOCATIONS 
AS METALINGUISTIC INFORMATION 


The main aim of bilingual dictionaries is to allow foreign-language users to encode 
or decode a text written in L2. Their microstructure therefore needs to incorporate 
information about the syntagmatic properties of lexicalitems, andin particular about 
the idioms and other types of word combinations in which these items participate. 
Collocations, in particular, will feature prominently in bilingual dictionaries. Cowie 
(1994: 3169) defines collocations as ‘associations of two or more lexemes (or roots) 
recognized in and defined by their co-occurrence in a specific range of grammati- 
cal constructions’. In a paper entitled “Words shall be known by the company they 
keep’, which owes a lot to the Firthian tradition (Firth 1968) that the statement of col- 
location is one of the most fruitful approaches to the study of lexical items and their 
relations, Mackin (1978) tackles the problem of how to teach collocations to foreign 
language learners. He stresses the fundamental distinction between production and 
understanding and argues that collocations do not pose any serious problems in the 
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understanding (L2-L1) process. Any non-native speaker is likely to recognize and 
understand a collocation, but using collocations and selecting the appropriate term 
is much more difficult and may even be considered as one of the most serious stum- 
bling blocks in language learning (Fontenelle 1997a: 18-20). We use the term ‘lexi- 
cal collocation’ here to refer to the privileged, idiosyncratic relationship that holds 
between some verbs and their subjects and objects (as in pay attention, make a mis- 
take, put out a fire) as well as between some nouns and the adjectives which modify 
them (as in confirmed bachelor, unquenchable thirst), between two nouns (as in a 
school of fish, a swarm of bees), or between a verb and an adverb (as in argue strongly, 
protest vigorously). 

Good bilingual dictionaries such as the Collins Robert French Dictionary (CRFD2) or 
the Oxford-Hachette (OH 1994) owe their reputations to the extensive use they make of 
metalinguistic information about the semantic, syntactic, and combinatory properties 
of words. They use a systematic approach to account for a whole range of collocational 
constraints and selection restrictions, which enables the lexicographer to capture the 
following relations: 


Typical objects (verb + object noun combinations) 
[CR] shoot 3 vt (fire) gun tirer or lacher un coup de (at sur); arrow décocher, 
lancer, tirer (at sur); bullet tirer (at sur); rocket, missile lancer (at sur) 
[OH] shoot III vtr 1 (fire) tirer [bullet]; lancer [missile]; tirer, décocher [arrow] 
Typical subjects (subject noun + verb combinations) 
[CR] croak 1 vi (a) [frog] coasser; [raven] croasser; [person] parler d'une voix 
rauque; (*grumble) maugréer, ronchonner 
[OH] trumpet III vtr [group, party] vanter les mérites de [lifestyle, success]; 
[newspaper] claironner 
IV vi [elephant] barrir 
« Noun—noun combinations 
[CR] barina (slab) [metal] barre f; [wood] planche f; [gold] lingot m; [chocolate] 
tablette f 
[OH] cluster I n 1 (group) (of flowers, grapes, berries) grappe f, (of people, 
islands, insects, trees) groupe m; (of flowers) touffe f; (of houses) ensemble 
m; (of ideas) ensemble m; (of diamonds) entourage m; 
» Adjective-noun combinations 
[CR] devouring adj hunger, passion dévorant; zeal, enthusiasm ardent 
[OH] confirmed adj [alcoholic, smoker, liar, habit] invétéré; [bachelor, sinner] 
endurci; [admirer] inconditionnel/-elle 


The examples in the list above show that the user needs to be familiar with the system 
adopted by each dictionary. The use of italics is crucial if no square brackets or paren- 
theses surround a typical collocation to distinguish it from the translation, as in the 
CR examples illustrating the typical direct objects of a verb. Similarly, the position of 
the bracketed material in the OH dictionary entries is essential information to help the 
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reader figure out whether the collocate is the subject of the verb (elephant s.v. trumpet) 
or the object of the verb (arrow sv. shoot). 


4.5 USING METALANGUAGE 
TO TRANSLATE TEXTS 


One of the primary goals of bilingual dictionaries is to assist a second-language user 
to translate an L2 text into L1 or an Li text into L2. Many modern dictionaries today 
include detailed sections appearing in the prefatory matter to explain how the bilin- 
gual dictionary should be used. Such sections focus quite naturally on how the meta- 
linguistic information about typical selection preferences can be exploited in order to 
disambiguate a word and identify the most appropriate translation depending on the 
context. A whole page in the Oxford-Hachette Dictionary (OH 1994: xxxv) describes 
at great length the process to translate ‘a sophisticated nightclub’ in the phrase “They 
spent the rest of the evening in a sophisticated nightclub in Mayfair’. The translation of 
sophisticated in this context is particularly challenging: the user is asked to look up 
the word sophisticated and to select the most appropriate numbered sense category 
(1 (smart)), in the following entry: 


sophisticated adj 1 (smart) [person] (worldly, cultured) raffiné, sophistiqué pej; (ele- 
gant) chic inv; [clothes, fashion] recherché; [restaurant, resort] chic inv; [magazine] 
sophistiqué; she thinks it’s ~ to smoke elle pense que ¢a fait chic de fumer; she was 
looking very ~ in black elle était trés chic en noir; 2 (discriminating) [mind, taste] 
raffiné; [audience, public] averti; a book for the more ~ student un livre pour les étu- 
diants plus avancés; 3 (advanced) [civilization] évolué;... 


After selecting the first number sense (1), the user is advised to look for the noun col- 
locate, in square brackets, which is closest to the context, that is, restaurant, and to note 
the translation (chic) and the usage information in italics (inv), which means that the 
form of the adjective chic does not change in the feminine or the plural. The translation 
of the whole sentence should then read ‘Ils ont fini la soirée dans une boite de nuit chic de 
Mayfair’. 

Although the process described in this section is accurate, it tends to underestimate 
the cognitive process which enables the user to select the noun collocate in square 
brackets which is closest to the context. The dictionary indeed includes a pair of col- 
locates (restaurant, resort) which does not include the word nightclub, which is the only 
clue in the original source sentence that the user can use to disambiguate sophisticated. 
For space reasons, the lexicographer is obviously not in a position to list all possible col- 
locates of sophisticated and the challenge is precisely to capture the statistically most 
significant collocates as well as the most salient ones in the bilingual entry. Nightclub 
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belongs to the same thesaurus class as restaurant or resort (a class that also includes bar, 
pub, café, etc.), but this information is implicit in the dictionary and the identification of 
a relationship between the various members of this set is one of the challenging obstacles 
in today’s attempts to perform automatic word sense disambiguation using the colloca- 
tional knowledge included in electronic bilingual lexical resources (see Michiels 2000). 


4.6 EQUIVALENCE IN BILINGUAL 
DICTIONARIES 


er PP OTOP TTT Teer reer ern r ever Te reree ree re rereeere re cre teri sTererverrevevereerrrcee Trevrer ec rere trerereretrirtrrerir ret eerie 


Duval (1991) notes that the role of translations in bilingual dictionaries is to provide 
target-language equivalents of the source-language headword. Users believe that it 
is always possible to translate a word because, they assume, equivalents necessarily 
always exist. Duval sums up the problem as follows: 


In fact, words being signs made up of a signifié which points to the real and a sig- 
nifiant which points to its representation in language, equivalence problems will 
appear on two levels: does the real exist or not in the culture of the speakers of a 
particular language? Does the word which describes the real exist or not in the lan- 
guage of these speakers? (Duval 1991 [2008]: 274) 


Under the entry juillet, the Oxford-Hachette (OH 1994) gives the following equivalence: 
juillet nm July le 14 ~ the Fourteenth of July, Bastille Day 


Although the lexical item is undoubtedly monosemous both in French and in English, 
most English target-language users will not readily understand the associations of 
ideas linked to this word and the equivalence will to some extent depend upon the cul- 
tural awareness of the user of the dictionary. 

In other cases, the signifié may only be present in the cultura] universe of the source- 
language user. The word prom is a case in point: 


[OH] promn...2US (at high school) bal m de lycéens 


Prom is unambiguous for English speakers, but the reality exists only in the source 
language and the bilingual entry would here benefit from a gloss which would pro- 
vide details about the significance of prom (and especially prom night) for American 
secondary school students, and especially for students in their last year of high school. 
Some monolingual learners’ dictionaries like the Macmillan English Dictionary for 
Advanced Learners (second edition, 2007) provide such glosses, which allow the lexi- 
cographer to achieve equivalence: 


BPR AAAANNE ERB, 24 ANY od ANZ AN FRING AWD be & 3 


In the US, prom night is especially important to sENIoRs (=students in their last 
year of high school). They buy expensive prom dresses or rent TUXEDOS (=formal 
men’s suits), give corsAGEs (=flowers you wear on your dress or jacket) to their 
DATES (=person you ask to go with you to the dance), and sometimes ride to the 
prom in a rented L1Mo (=very long, expensive car), 


4.7 CONNOTATION AND DENOTATION 


Users generally expect a bilingual dictionary to provide them with faithful transla- 
tions. However, perfect equivalence is not always achievable and, as noted by Duval 
(1991, 2008: 275), would require equal levels of denotation and of connotation. The for- 
mer corresponds to the reference to the same element of the real world, while the latter 
is the reference to the same network of cultural associations linked to the words in both 
languages. So, if the French word parsec is translated as parsec in a French-English dic- 
tionary, no further explanation is required because the English translation has exactly 
the same denotative and connotative value as the French headword. Any reference to 
the fact that parsec refers to an astronomical unit of distance would be superfluous, 
since the bilingual dictionary serves a different purpose from a monolingual diction- 
ary. For some other scientific terms, however, there may be different translations which 
are denotatively equivalent (they refer to the same reality), but which have different 
connotations. The word rubéole in French is a case in point: in the Oxford-Hachette 
Dictionary (OH 1994), it is translated as ‘German measles, rubella spéc’, which indicates 
that the translation rubella is more specialized or more technical and would be more 
likely to be used by a doctor. The words German measles and rubella are therefore inter- 
changeable from a purely semantic point of view, but not with respect to their uses and 
their frequencies. The specific connotation of one of the translations should therefore 
be made explicit via a metalinguistic tag such as spéc in the OH Dictionary, or Tech 
(technical) or Med (medical). 


4.8 USING A CORPUS TO COMPILE 
A BILINGUAL DICTIONARY 


SPEED e eee ate Rene eee Rn RNa e Re AU AERA SHENG ERRNO AEA ELEES OA ENED A DAD Ea Ewan neg ER GE GEREDGEDEDED EDA DE DE DDE OEBEDEGEAPEGCESOE DLE SS ODP DEDATAROAD SIDERED EAD ADEDEDIDDSEMESSSPEDOSED abe te E eee 


Until the early 1980s, dictionaries, monolingual and bilingual alike, were compiled 
manually and the lexicographer was mainly using his or her knowledge of the source 
and target languages to compile the microstructure of a dictionary and to choose 
examples and definitions. The advent of computer technology in the 1980s then made 
it possible to manipulate huge amounts of textual data as well as to store and retrieve 
lexical data in novel ways. The first corpus-based dictionaries appeared at the 
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end of the 1980s (monolingual Collins Cobuild English Language Dictionary appeared 
in 1987). The debate between intuition-based dictionaries and corpus-based diction- 
aries raged for a number of years. Lexicographers were then wondering whether they 
should invent their own examples or should rather use real sentences excerpted from 
large bodies of running texts (Laufer 1992). The widespread availability of corpora 
(initially of a few million words, then soon of up to 100 million words, as with the 
British National Corpus, then of billions of words with the availability of Internet 
data and search engines) revolutionized the lexicographers’ work (compare Kupietz, 
this volume, Kosem, this volume). Access to large quantities of texts is essential for 
the lexicographer, who needs to be able to compute the frequency of lexical items, to 
find out which senses are the most frequent ones, which senses have become obsolete, 
what selection preferences a word typically has, which collocations or idioms it typi- 
cally enters into, etc. Without the help of a corpus, lexicographers are unable to list 
the most typical direct objects of a transitive verb, or the most typical nouns modi- 
fied by a given adjective. Whether the lexicographer is working in a monolingual 
or in a bilingual perspective, the tools required to perform such lexical analyses are 
the same. As is illustrated above, such information about typical word combinations 
and selection preferences is crucial and needs to be captured in a bilingual entry 
to disambiguate polysemous items and help the reader select the most appropriate 
translation. 

Grundy (1996) describes her experience with the creation of the Oxford-Hachette 
French Dictionary (OH 1994), which used an English cor pus of about 60 million words, 
and a French corpus of around 10 million words of contemporary French, She notes 
that there are two approaches concerning the use of corpora: the approach employed 
for the creation of the OH 1994 involved the use of two independent corpora, none 
of which included any translation of texts present in the other corpus. Alternatively, 
a number of experiments have been carried out with bilingual corpora consisting of 
translated texts. The advantage of the latter method is that bilingual corpora can be 
aligned, at paragraph, sentence, or even at phrase level, which facilitates the task of the 
lexicographer and makes it possible to highlight regularities in the translation of spe- 
cific expressions and idioms. Grundy argues that it is nonetheless definitely preferable 
to use original, untranslated texts, which reduces the risk that the bilingual dictionary 
entries might be biased by some ‘translationese’. 

Grundy describes the three main stages in the compilation of corpus-based 
bilingual dictionaries, viz. analysis, transfer, and synthesis (see also Grundy and 
Rawlinson, this volume). 


4.8.1 Analysis 


The first phase consists of using a corpus to generate the concordances of a given 
lexical item in the form of KWIC lines, and to extract the list of simple and complex 
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lexical units, together with their collocational profiles, as can be produced by tools like 
Kilgarriff’s Sketch Engine, for instance ( Kilgarriff et al. 2004). Sorting KWIC lines on 
the right context highlights patterns such as ‘depend + PREP + on’, ‘advantage + PREP 
+ on’, or ‘advantage + PREP + in +V-ing’. The various syntactic patterns in which a word 
is typically found can then be listed by the source language lexicographer, together 
with their typical semantic characteristics. When analysing the linguistic properties 
of the expression take advantage, Grundy (1996: 139-41) shows that its five most fre- 
quent constructions can be described as follows: PERSON or HUMAN GROUP takes 
advantage of FACILITY (1), EVENT (2), SITUATION (3), PERSON/HUMAN GROUP 
(4), or QUALITY/EMOTION/STATE (5). The semantic category FACILITY comprises 
two domains, GENERAL (for entities such as scholarship, technology, facilities, etc.) 
and COMMERCIAL (for items like offer, credits, lower prices, deal, tax allowance...). 
EVENT covers lexical units such as opportunity, discussion, disaster, etc., and 
SITUATION covers units like situation, circumstances, weather, moment... Lexical 
units illustrating the possible direct objects corresponding to the PERSON/HUMAN 
GROUP include words such as senior citizens, lonely women, patient, workers, etc. 
Finally, the category for QUALITY/EMOTION/STATE includes items such as weak- 
ness, trust, ignorance, generosity, etc. 


4.8.2 Transfer 


The transfer phase enables the lexicographer to propose general translations for each of 
the classes identified during the analysis phase. Grundy sums up the transfer phase for 
take advantage as follows, for each of the categories identified above: 


+ GENERAL (derive benefit by using): utiliser, profiter de 

+ COMMERCIAL (derive financial benefit by using): profiter de 

» EVENT (react to, in order to do something one wants to do): profiter de 
+ SITUATION (derive benefit from): utiliser, profiter de 

» PERSON/HUMAN GROUP exploit unfairly): utiliser 

+ QUALITY/EMOTION/STATE (exploit unfairly): profiter de 


The target-language lexicographer can then compare the data generated on the basis 
of the source corpus with parallel contexts found for the proposed translations in 
the target language corpus. Hypotheses can then be verified, which makes it pos- 
sible to confirm the validity of the initial translations and to identify problem areas, 
especially when it comes to assessing the degree of semantic and syntactic equiva- 
lence and spotting potential false friends. Parameters such the relative frequencies 
of SL and TL equivalents also need to be taken into account. For instance, a lower 
frequency of a TL equivalent can point to either a semantic difference or a difference 
in the domain of use. 
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4.8.3 Synthesis 


The use of a corpus in the analysis and transfer phases makes it possible for the lexi- 
cographer to test the validity of each translation in a real context, as is pointed out by 
Grundy (1996: 146). General conclusions need to be drawn, without forgetting that a 
bilingual dictionary is supposed to provide general translations that are valid in all 
contexts, rather than translations that would be appropriate in some contexts and 
impossible in others. These generalizations will need to meet the encoding and decod- 
ing needs of the user and hence will be accompanied by metalinguistic indicators to 
guide the user and show that some translations are linked to specific selection prefer- 
ences and collocations. Less general translations can also be included, provided a usage 
note or label is included to make any usage restriction explicit. The use of the TL corpus 
will then be crucial to identify any difference in register, countability, voice, or tense 
preference, etc. 


4.9 IMPROVING BILINGUAL DICTIONARIES 
WITH FRAME SEMANTICS 


In this section, I would like to show how access to large corpora and recent advances 
in lexical-semantic theories enable Jexicographers to improve bilingual dictionaries. 
I will concentrate more specifically on an experiment I carried out with the British 
lexicographer Sue Atkins a few years ago. In this experiment, our plan was to demon- 
strate how Fillmore’s frame semantics (Fillmore et al. 2003; Fontenelle 2003) can help 
improve a bilingual dictionary entry like the verb cook in an English-French diction- 
ary. Atkins (2002a) demonstrated how the description of the English verb could be 
enhanced and I would like to show how the French translation part can benefit from a 
detailed analysis by resorting to concepts such as beneficiary, duration, or causativity/ 
inchoativity. 

The aim of frame semantics is to record how a language links frame elements to the 
syntactic constituents which depend on the lexical items which evoke this frame. The 
objective is to describe the possible constellations of elements which gravitate around 
one of these lexical items. The description can be given in the form of a ‘matrix’ speci- 
fying which element clusters are allowed for a given predicate. The syntactic role 
played by each element also needs to be described. In other words, the idea is to try to 
code the knowledge of the world which is necessary to understand a given scenario. 
Atkins (2002a) shows that the verb cook participates in at least two distinct scenar- 
ios or frames: on the one hand, the Apply_Heat frame, which describes how heat is 
applied to transform raw food into something edible; on the other hand, the Cooking_ 
Creation frame which describes a creation process, several ingredients being mixed or 
combined. 
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Cook: 


A) Apply_heat: to change foodstuffs from raw state by applying heat 
B) Cooking creation: prepare food by mixing, combining, and heating the 
ingredients 


A sentence like ‘Cook the meat in a saucepan over a high heat until brown’ corresponds 
to Frame ‘A’ and each constituent can be analysed as follows: 


Target Food Container Temperature Duration 
Cook themeat inasaucepan overahighheat until brown. 
NPObj PP Comp PP Comp Ssub Comp 


The first row includes the various frame elements: ‘Food’ is realized in the sentence as 
an object Noun Phrase (the meat), while in a saucepan corresponds to the Container 
and is realized as a prepositional phrase. Duration is also an important element, intro- 
duced here by a subordinating conjunction. 

This type of analysis shows that the bilingual (E~F) entry created by Sue Atkins 
for the first edition of the Collins Robert French Dictionary is not exhaustive and 
fails to record a number of interesting properties of the verb cook. This dictionary 
was originally compiled before large text corpora were made available to lexicogra- 
phers, who were primarily relying upon their intuition and their knowledge of the 
language. 

The analysis of concordances of words like cook or cuisiner, cuire, cuisine in French 
corpora reveals that the following properties are not mentioned in the dictionary 
entry: 


1. High frequency of occurrence of beneficiaries. It is essential to understand the 
notion of beneficiary if one wants to understand why cook is best translated as 


cook [...] 1 a cuisinier m, iére f she is a good ~ elle est bonne cuisiniére, elle fait bien ta cuisine; to 

be head or chief ~ and bottle-washer* (in a household) servir de bonne a tout faire; (e/sewhere) 

étre le factotum. 2 cpd: cookbook livre m de cuisine; (Mil, Nout) cookhouse cuisine f; (US) 

cookout repas m (cuit) en plein air. 3 vt{a) food (faire) cuire. (fig) to ~ sb's goose" faire son 

af faire a qn, régier son compte a qn; {b) (Brit*: folsify) accounts, books truquer, maquiller. 4 vi 

[food] cuire; [persan] faire la cuisine, cuisiner. she ~s well elle fait bien la cuisine, elle cuisine 

bien; what's ~ing?** qu’est-ce qui se mijote?* 

cook up” vt sep stary, excuse inventer, fabriquer 

cooking 1 cuisine (activité). plain / French ~ cuisine bourgeoise / francaise. 2 cpd utensils de 
cuisine; opples, chocolote a cuire. cooking foil papier m d’aluminium; cooking salt gros sel, sel de 

cuisine 


FIGURE 4.1 The verb cook in the Collins Robert French Dictianary (CRFD1) 


58 


THIERRY FONTENELLE 


préparer when the person for whom the food is cooked is mentioned. For She 
cooked me a wonderful breakfast (= elle m’a préparé un merveilleux petit déjeuner), 
the verbs cuireand cuisiner are not possible, faire is possible, and préparer is much 
more common. Yet the translation préparer does not appear anywhere in the 
original 1978 entry. 


. The structure ‘cook and/or’ is not mentioned anywhere in the entry either, 


despite its frequency in English corpora. In ‘She liked to cook and sew’, the 
French verb should be cuisiner or the support verb construction faire la cui- 
sine, but not cuire. Note that, even if this specific construction is not men- 
tioned, the original dictionary entry does indicate that the intransitive use of 
the verb should be translated as faire la cuisine or cuisiner when the subject is a 
person (subsense 4). 


. Nothing in the entry indicates that the causative/inchoative alternation, which 


is typical of change-of-state verbs, applies only to cases where cook refers to the 
Apply_Heat frame with raw food. The metalinguistic indicator food appears 
twice in the entry, to signa] that it can function as object of the transitive verb or 
as subject of the intransitive verb (meat, food, chicken are the most frequent col- 
locates for this sense and this alternation). 


. The concept of duration, which corresponds to a frame element in this the- 


ory, is crucial to understand that cook is translated as ‘laisser cuire’ when 
used with a temporal expression (until brown, for 10 minutes ...). The original 
entry in the Collins Robert French Dictionary (CRFD1) does not mention this 
construction. It could also be added that the direct object is extremely rare 
when duration is expressed and that the construction ‘laisser cuire’ in French 
is always used in an imperative sentence (e.g. Laissez cuire a petite ébullition 
pendant 2 heures). 


. Cooking is frequently translated as cuisine and the dictionary includes this piece 


of information. However, it fails to mention that cuisson is another frequent 
translation, especially when the temporal element is present (fast cooking = cuis- 
son rapide; during the cooking = pendant la cuisson) and in many compounds 
(jus de cuisson, température de cuisson). 


The revised draft for the verb entry then looks as follows (see Atkins 2002a for full 
details about the analysis of the original entry and the lexicographical analysis that led 
to this revision): 
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cook (stressing actual cooking) cuisiner. To 
I verb 1 (change from raw state) cook sb a meal, to cook a meal for sb pré- 


: . : arer un repas aqn. 
ia vt [person] food faire cuire, cuire; p P q 


[oven] food cuire. 2b vi [person] (generally) faire la cuisine; 
(stressing actual cooking) cuisiner. It’s 
Paul who cooks c’est Paul qui fait la cui- 
sine; to cook outside in the open cuisiner 
en plein air; she cooks well elle fait bien la 
cuisine, elle cuisine bien; (in lists) I can’t 
sew or cook je ne sais ni coudre ni cui- 
siner; she used to cook for them elle leur 
préparait des repas., 


ib vi [food] cuire. The soup cooking on 
the fire la soupe qui cuisait au feu. 

1c vi (in recipes) (faire) cuire, (+ time 
expression) laisser cuire. cook ina hot 
oven (faites) cuire a four trés chaud; 
cook for 10 minutes laissez cuire 
pendant 10 minutes. 


2 (make meals, prepare) 


2a vt meal, dish (prepare) préparer, faire; 


The revised draft entry for the noun cooking looks as follows in Atkins (2002a): 


cooking II modifier 
Inoun 1 de cuisine. cooking smell / utensil 


1 (see vb 2: activity, food) cuisine f to do odeur f/ ustensile m de cuisine 
the cooking faire la cuisine; French 2 de cuisson. cooking liquid / method 
cooking la cuisine frangaise liquide m / méthode fde cuisson 


2 (see vb 1: process) cuisson f slow 3 apples, chocolate etc, a cuire 
cooking cuisson lente 


4.10 OTHER TYPES OF BILINGUAL 
DICTIONARIES 


In addition to the traditional bilingual dictionaries listed at the beginning of this arti- 
cle, a number of alternative types of bilingual resources have been created over the last 
twenty years, with varying degrees of success. One of them is the so-called “bilingual- 
ized dictionary’, which combines the features of monolingual dictionaries for foreign 
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learners, with easy-to-understand definitions in the source (L2) language, and the 
characteristics of bilingual dictionaries, with translations in the targeted user’s mother 
tongue (L1). As noted by Marello (1996: 50-1), who gives the example below from the 
Password series published by Kernerman Publishing, such dictionaries have been 
mainly developed and marketed in Israel: 


scene [...] noun 1 the place where something real or imaginary happens: A murderer 
sometimes revisits the scene of his crime; The scene of this opera is laid/set in Switzerland. 
OO scéne 


2 an incident etc. which is seen or remembered: She recalled scenes from her childhood. 
Qincident 


3a show of anger: I was very angry but I didn’t want to make a scene. O scéne 


[...] 


These bilingualized dictionaries now exist for a wide range of languages (English to 
Arabic, French, Estonian, Spanish, Portuguese, Chinese, etc.). The reader is of course 
free to ignore the translations, in which case the dictionary serves as a purely mono- 
lingual dictionary. It keeps the basic structure of monolingual resources and, as such, 
provides a tool for understanding, rather than mere translations. Given that the reader 
should ideally read the definition and the examples first, the equivalents provided at 
the end of the sense division are supposed to reassure the user and clarify the meaning 
of the headword, if necessary. However, as is pointed out by Heuberger, many users 
tend to skip the monolingual parts and go directly to the translations, which, from 
a didactic point of view, unfortunately reduces their exposure to the L2 (Heuberger, 
this volume). The simplicity of the design and format of bilingualized dictionaries (or 
semi-bilingual or hybrid dictionaries, as they are also sometimes called—Hartmann 
1994) make them interesting tools that provide a bridge between monolinguals and 
bilinguals and they have been found to be effective by some researchers (Laufer and 
Hadar 1997). 

Another alternative type of bilingual dictionary is provided by the Word Routes 
series of bilingual thesauruses with English as a source language and French, Spanish, 
Catalan, Italian, Greek, and Brazilian asa target language (e.g. Cambridge Word Routes 
Anglais—Francais, McCarthy and Walter 1994). These thematically-organized diction- 
aries provide an onomasiological approach to the organization of the lexicon. The pro- 
totypical user of these production-oriented dictionaries, who is a learner of English, 
accesses words via two indexes (one in English and one in the target language) which 
point to conceptual categories or themes ranging from beautiful, ugly, easy, war, sepa- 
rate, cold, hot, fall, or rise, but also to concrete nouns, such as musica! instruments, 
vehicles, fruit, etc. 
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4.11 CONCLUSION 


Bilingual dictionaries have traditionally received less attention than monolingual 
dictionaries. As noted by Adamska-Sataciak (2006), at least in the English-speaking 
tradition, monolingual dictionaries have been the centre of attention for at least four 
decades. Yet bilingual dictionaries deserve as much respect as monolingual reference 
works. The collocational knowledge they include is in many ways richer than what 
can be found in monolingual dictionaries, precisely because the bilingual lexicogra- 
pher needs to provide as many linguistic clues as possible to allow the user to select 
the appropriate equivalent of a word, and the description of possible contexts is a sine 
qua non for the modelling of equivalence. Bilingual dictionaries are also the type of 
reference work most frequently used by foreign language learners, despite the (mainly 
unfounded) criticism that is often levelled at them (Adamska-Sataciak 2006: 24-5). 
Because they frequently meet many of the needs expressed by learners, they certainly 
deserve a chapter of their own in this handbook, which, together with Arleta Adamska- 
Sataciak’s and others’ contributions, should provide evidence that there is definitely 
more to bilingual dictionaries than meets the eye. 
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5.1 INTRODUCTION 


SYSTEMATICALLY assembled collections of communication acts, known as corpora, 
are now the most important empirical foundation in lexicography and in linguistics in 
general. This chapter aims to answer questions that are generally relevant for the task 
of constructing a corpus that can serve as a sound empirical basis for the creation of 
dictionaries as well as for linguistic research. Starting from theoretical considerations 
of corpus design and representativeness, it also discusses practical issues, such as how 
the primary data in corpora can be enriched with other kinds of information, how raw 
data can be converted to corpora, and how the necessary rights can be acquired. In 
these considerations the construction of a corpus will be viewed primarily as a complex 
optimization task that should best be approached iteratively, for which typically no sin- 
gle optimal solution can be found, and in which costs are a crucial factor. 


5.2 EXISTING CORPORA 


Despite rapid developments in recent years, the best-known corpora used in linguistics 
are probably still the Brown Corpus, which was established in the 1960s and comprises 
a total of about a million words from various text types, and the 100 million-word 
British National Corpus (BNC), which was first published in 1994 (Aston and Burnard 
1998). There are also now some very large corpora for languages other than English that 
are often several times larger than the BNC. Many European countries have established 
large-scale projects to develop and maintain national reference corpora, as, for exam- 
ple, the Polish National Corpus (see Przepiérkowski et al. 2010), the Czech National 
Corpus (SYN2005, Institute of Czech National Corpus, Charles University, Prague), or, 
in Germany, the DWDS corpus (see Geyken 2007) and the German Reference Corpus 
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(DeReKo) (Kupietz et al. 2010), which uses an alternative design approach (see Section 
5.33) and with over 25 billion words is currently one of the largest available corpora. 
Since the early 2000s corpora that feed exclusively on the inexhaustible amounts of 
text of the World Wide Web have also enjoyed increasing popularity, as for example 
the Corpus of Contemporary American English (COCA) (Davies 2010), which is con- 
tinuously updated and useable as a monitor corpus, and the more randomly sampled 
corpora developed in the context of the Web-as-Corpus kool ynitiative (cf. Baroniet al. 
2009), like ukWaC for British English (2 billion words), deWaC for German, frWaC for 
French, and itWaC for Italian. 

Before constructing a corpus, it is advisable to double check that a resource with 
the required properties does not already exist. Good sources for such information are, 
for example, the catalogue’ and the LRE map” of the European Language Resource 
Association (ELRA), the catalogue of the Linguistic Data Consortium (LDC), and 
the Virtual Language Observatory (VLO)? that was initiated in the context of the 
Common Language Resource Infrastructure (CLARIN). In the latter, small resources 
are also listed, and kept up-to-date by the respective creators themselves. Even if the 
desired resource does not seem to be available, it might be useful to get in contact with 
one of the corpus centres that are also listed in the VLO. The corpus centre might know 
how an existing resource could be re-licensed, ifa comparable resource is already being 
developed somewhere, and, if that is not the case, they might give hints on how to pro- 
ceed or even detailed guidelines on how to ensure, for example, that you close the cir- 
cle, so that the centre can later make your corpus available to others in a sustainable 
fashion. Consider that it might sometimes be more reasonable and valuable to re-use or 
improve an already existing corpus than to construct a new one from scratch. 


5.3 APPROACHES TO CORPUS 
DESIGN AND COMPOSITION 


5.3.1 Representativeness 


The main purpose of a corpus is to allow the generalization of observations from the 
corpus itself to the specific language domain that is to be investigated. A necessary 
condition for such extrapolations to be justified is that the corpus constitutes a suf- 
ficiently representative sample of the domain in question. Unfortunately, however, one 
cannot easily decide whether such a relation holds. There are several reasons for this. 
One of them is that the language domain, that is, the basic population the sample is to 


 <http://catalog.elra.info/>,.—mainly intended for technological applications, however. 
2 <http://www.resourcebook.eu/>. 
> <http://www.clarin.eu/vio/>. 
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represent, typically cannot be defined in a rigorous fashion. What does, for example, 
contemporary English mean? If we attempt some naive definition such as ‘all utterances 
among people born in England in the past decade’, then we get the practical problem 
that we do not know what those utterances were. If we did know, we would definitely be 
able to find counter-examples in the form of many utterances occurring in the sample 
that we would not call English, and even if we could come up with a much better defini- 
tion, we would be able to find (not to mention produce spontaneously) any number of 
utterances that are not present in the sample but we would like them to be. The latter 
circumstance also gives rise to the suspicion that our object of investigation is a moving 
target. And indeed, language is at its core something like a dynamically changing and 
growing artefact. An even more fundamental problem for the whole endeavour is that 
the object that we need to define is exactly the unknown object that we would like to 
investigate. 

Despite all these problems with representativeness, some of them serious, research- 
ers did fortunately take the step of starting to use corpora, so that today we know that 
they can indeed be successfully used for investigating language, often with surprising 
results. A promising way to proceed, and one that has met with some level of agreement 
in the corpus linguistic community (cf. Biber 1993), is to apply an iterative bootstrap- 
ping strategy: start with a rough approximation of representativeness, act asif we hada 
perfect relation between sample and population, see what we can find out, check if the 
findings are justified, and, if possible, close the circle by using the findings to improve 
the corpus with respect to its representativeness. We must not forget, however, that the 
generalizations from our corpus-based findings to a total population may indeed not 
be justified. 


5.3.2 Stratification 


The approach usually taken to get a better approximation of representativeness—not 
only in linguistics but also in other disciplines—is called stratified sampling. It con- 
sists of first dividing the total population into subgroups (strata). Then, the sample is 
not drawn randomly from the total population but rather from the subgroups, so that 
the quantitative distribution of items in the sample is ideally the same as in the total 
population. The general advantage of stratified sampling is that gross sampling errors 
can be avoided and typically a more faithful mapping, that is, better representative- 
ness, can be achieved. Applied to corpora, there are at least two more, closely related, 
advantages: first, sampling becomes more practical. In order to sample, for example, 
voters from an election, you could do something like pick every tenth person coming 
out of the polling station. In order to sample texts, however, you can hardly proceed 
analogously. Texts need to be chosen deliberately in order to capture representatives of 
relevant classes, and it is easier to decide where to look for them if the initially big and 
fuzzy language domain is divided into smaller more clear cut units. In addition, the 
stratification can also help us to iteratively refine the definition of the domain. 
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It is helpful, though not necessary, to first identify the dimensions along which 
promising strata can be isolated. For the choice of such dimensions two criteria are 
particularly important. First, they should be relevant for the domain with respect to 
possible questions or purposes. For example, for the domain of voters ina certain elec- 
tion, the dimension age could be relevant with respect to the question of predicting the 
election results but shoe colour not. The second criterion sounds more like a prerequi- 
site, but it is often a gradable one: the attribute values of the basic elements of the popu- 
lation according to these dimensions should be known or measurable. For example, the 
colour of the underwear (if it were relevant) of the voters would not be easy to deter- 
mine and, for example, their satisfaction would not be straightforwardly measurable. 
In the case of corpora, these two criteria usually lead to a selection of the following, 
non-orthogonal dimensions (and example values): 


¢ mode (spoken vs. written) 

* place (of publication or where the author comes from) 

* genre (fiction, news, academic, opinion, journal, etc.) 

* text type (interview, comment, novel, short story, political speech, etc.) 
* topic domain (politics, economy, sports, science, etc.) 

* audience (at which a text is directed) 

¢ time (i.e. the time when a text was originally produced or published) 

* register (formal, informal, frozen, casual, etc.) 


It has to be noted, however, that in the case of corpora the relevance of these dimensions 
also varies depending on the research question (cf. Atkins and Rundell, 2008: 74f). For 
example, in order to date neologisms, the time dimension would be crucial, whereas 
the text-type dimension might be of minor significance. How well the second crite- 
rion can be met often depends on the source of the data. While for newspaper arti- 
cles the date of first publication can usually be determined with some certainty, this is 
not generally the case for web pages, where it can sometimes only be guessed. A differ- 
ent kind of problem arises with the dimensions genre, text type, topic, and register. In 
these cases the categories and the rules according to which they are assigned cannot 
be found in or inferred from the primary data but need to be defined. Thus, even in 
the approximation to representativeness, some compromises have to be made and we 
are reminded that corpus-based linguistics is not strictly science but at least to a large 
extent an art or craft. 


5.3.3 Balanced and Primordial Samples 


The identification of relevant strata is only half the battle for designing a corpus that 
is representative for its intended domain. The second step is to make sure that the 
quantitative distribution across the strata is ideally the same for the corpus as for the 
domain. The unit typically chosen for the measurement of this distribution is not texts 
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but—since for example novels are usually much longer than newspaper articles— 
words (and sometimes sentences). In order to stipulate the word shares of the respec- 
tive strata, two different approaches can be distinguished. The traditional approach, 
followed by most well-known corpora, is to define a so-called balanced distribution 
with respect to a selection of strata and to more or less fix this as the target distribution 
in the design phase of the corpus. The British National Corpus, for example, has defined 
such a balanced distribution according to modality (90% written texts and 10% speech 
transcripts) and within the written texts a distribution according to domain (e.g. 21.91% 
imaginative, 11.13% leisure, etc.) and medium (e.g. 58.58% books, 31.08% periodicals) (for 
details, see Aston and Burnard 1998: 29-30). A downside of a strictly balanced design 
approach is that texts must be discarded if their inclusion in the corpus would have the 
consequence that one of the fixed quotas is exceeded—even if the texts come for free. If 
you were compiling a corpus and the Guardian newspaper offered you its whole archive 
for free, would it be nota pity if you had to answer ‘No thanks, we already have enough 
newspaper texts, but do you happen to have personal letters?’ At the time when the 
BNC was constructed, however, due to the general lack of electronic texts, there were no 
texts for free and the costs for different kinds of texts were more comparable than they 
are today—partially also because of a lower sensitivity to legal issues and because the 
electronic exploitation of text was still uncommon. 

A different approach has been developed by Cyril Belica for the creation of the 
German Reference Corpus DeReKo (see Kupietz et al. 2010) that is not intended to 
be balanced in any way. The underlying rationale is that the term balanced—just as 
much as the term representative—can only be defined with respect to a given domain 
or population and will also vary according to the differing objectives for which the cor- 
pus will be used. Therefore, using a pre-existing fixed resource as a general-purpose 
or multi-purpose corpus is inefficient as it dictates a specific domain to be analysed. 
Instead, these issues should, as far as possible, be decided in the usage phase of the cor- 
pus depending on the domain that is to be researched and on the specific purpose or 
question in mind. As already indicated, it is impossible to state in general what specific 
proportions of text types can be considered balanced or, even more importantly, what 
the relevant dimensions are in the first place. For example, for corpora to be used for 
the creation of general, learners’, slang, and neologism dictionaries different dimen- 
sions are relevant and different distributions over the strata are desirable. Although 
the whole archive may be used as a sample itself, the principal purpose of DeReKo is to 
serve as a versatile primordial sample (‘Ur-Stichprobe’) from which specialized subsam- 
ples, so-called virtual corpora (‘virtuelle Korpora’), can be drawn. As a consequence, 
the development of DeReKo can focus on the maximization of size and dispersion with 
respect to previously chosen strata, and, iteratively, also with respect to strata identified 
at a later stage, while the composition of task-specific subsamples is left to the usage 
phase. In general, and from an economic point of view, such an approach allows for a 
better exploitation of the available corpus data, as no texts need to be discarded andthe 
corpus data are reusable for a range of different purposes that would otherwise require 
the creation of new corpora from scratch. 
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The downside of this approach is, however, that it puts certain demands on the query 
and analysis software and the users. The software needs to support the definition of 
virtual corpora based on the above-mentioned properties of whole texts and ideally 
also on text-internal properties such as, for example, the presence or absence of certain 
words or phrases. In addition, the technical infrastructure needs to be able to ensure 
the persistence of such virtual corpora and the possibility of referring to them in order 
to permit reusability and replicability and in order to allow them to be used as a per- 
sistent reference for the comparison of findings based on other corpora. The users, on 
the other hand, are themselves responsible for defining a corpus composition that is 
appropriate for their purposes, or they need to choose an existing virtual corpus. If 
these prerequisites are met, a continuous extension of the total corpus (while at the 
same time having fixed virtual corpora that can serve as reference) is also straightfor- 
ward. Accordingly, the primordial sample can also serve as a basis for (virtual) moni- 
tor corpora and synchronic corpora that attempt to represent contemporary language 
domains more thoroughly by means of sophisticated (sub-)sampling strategies (cf. 
Belica et al. 2009). 

For both of the design approaches just sketched, knowledge about the properties of 
texts is crucial, in order to be able to estimate the generalizability of corpus-based find- 
ings. The importance of such knowledge is not limited to corpus composition but also 
concerns the interpretation of the results of corpus queries. For example, the result of a 
corpus query such as ‘word A is more frequent than word B’ could be further analysed 
to something like ‘A was more frequent than B until year Y depending on text type 
and domain’—again provided that such analyses are supported by the query software. 
Concerning ‘representativeness’ and ‘balance’, it is a notoriously reoccurring mistake 
to use these terms without a reference (‘representative for W’, or ‘balanced with respect 
to X, Y, and Z’). To say a corpus is ‘representative’ or ‘balanced’ without further specifi- 
cation, is about as meaningful as to say ‘it washes whiter’. 


5.3-4 Corpus Size 


Another important—if not the most important—factor not only for the generalizabil- 
ity of corpus findings but for finding something in the corpus in the first place, is its 
size. It is again impossible to say in general how big a corpus needs to be (see however 
Biber 1993 for a more profound analysis). What can be said in general, however, is that 
Robert Mercer’s claim ‘more data is better data’ (Church and Mercer 1993) still applies 
to its fullest extent even in the age of gigaword corpora. Provided that the texts it con- 
sists of have a certain dispersion (i.e. they are, for example, not identical copies of each 
other), the larger a corpus is, the more information it contains about its domain, the 
greater number of significant associations can be found, the more precise and detailed 
findings can be made, and the more valid conclusions will typically be. This is particu- 
larly true or important when infrequent phenomena are examined, which is not only the 
case when rare words are involved, but also when combinations of words and/or other 
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factors are examined. To take an extreme example, in order to investigate the typical 
adjectives used in conjunction with the word ‘bloke’ in British young adult literature 
over a time period of the last fifty years in contrast to comparable Australian literature, 
a general purpose corpus would probably have to be very large. As extreme as the exam- 
ple may be, it makes clear that sometimes the absolute number of texts available with 
certain properties is much more important than their relative quantitative distribution 
along the respective property dimensions. Or, to put the horse before the cart, the best 
stratification and (relative) balance do not help when relevant intersections of the cho- 
sen text-internal and/or text-external properties (as time, origin, text type, and audience 
in the example above) turn out to be empty or only sparsely occupied by a few texts. 
Thus Kenneth Church’s (2003) statement ‘while balance is desirable, size is even more 
desirable’ is also true, at least to the extent that for constructing a corpus today, where 
the above-mentioned software requirements can be met, size should probably never be 
sacrificed for whatever notion of balance. 


5.4 ENRICHING A CORPUS: METADATA 
AND ANNOTATIONS 


In the preceding sections it was argued that in order to know what a corpus can be used 
for and in order to be able to interpret corpus-based findings, knowledge about the 
texts that a corpus consists ofis necessary. In the case of a primordial sample approach, 
such knowledge additionally needs to be made explicit in order to allow or to facilitate 
the construction of virtual corpora in an analogous way to that described in Section 
5.3 for the corpus proper. This section deals with the question of what additional kinds 
of enrichments of the primary data can be useful and how they can be determined or 
defined. 

Corpora consist of primary data, that is, the recorded texts or utterances them- 
selves, and usually also metadata and annotations. The term metadata is used for addi- 
tional information on or above the level of texts, while the term annotation is typically 
used for additional information on items below the text level, such as paragraphs, 
passages, phrases, or words. Widely used categories of metadata on the text level are, 
for example, bibliographic information (author, title, publisher, date of publication or 
production, place of publication) and the categories that were used in the context of 
the sampling of the corpus (text type, topic domain, intended audience, register, etc.). 
Analogous to the determination of strata (see Section 5.3.2 above), the corpus creator 
needs to decide (a) what attributes to use, (b) what kinds of values the attributes can 
have, and (c) according to what rules the values are attributed to the texts. The criteria 
for the selection of (a) and (b) can be translated to the following questions: what attrib- 
utes and corresponding value sets could my corpus yield, and which of them could be 
relevant for its use? An additional question for consideration can be what choices have 
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been made for other well-established corpora—not only because they have probably 
been made wisely, but particularly in the interest of later comparability of the corpora 
and of findings based on them. 


5.4.1 Example: Classification of Topic Domains 


At this point in this otherwise rather condensed contribution I will describe the 
dimension or attribute ‘topic domain’ in more detail, in order to illustrate some typi- 
cal procedures and consequences. With nine different values (applied science, arts, 
belief and thought, commerce and finance, imaginative, leisure, natural and pure sci- 
ence, social science, world affairs), the domain classification of the BNC could serve as 
a good starting point. Ifthe BNC taxonomy does not cover the texts in your corpus well 
enough, you could add some more categories. If it is too coarse for your purposes, you 
could add a second level to the taxonomy, that subdivides the top-level layer into sub- 
categories, for example ‘leisure’ into ‘gardening’, ‘cooking’, and ‘sports’. If that is still 
not appropriate for your desired corpus, further sources of inspiration could be other 
corpora, or the taxonomy that is typically used in libraries or archives. An additional 
approach could also be to proceed bottom-up by having a closer look at what domains 
are actually present in your corpus. This can be facilitated by applying an automatic 
unsupervised document clustering to your corpus texts (cf. Karypis and Zhao 2002). 
Such methods are widely used in data mining and information retrieval, and are typi- 
cally capable of grouping texts into a number of clusters that were not given previously, 
solely based on the similarity of their vocabulary. These clusters can then be inspected 
and labelled with topic categories. 

Assuming that, based on these sources of information, you have arrived at a tax- 
onomy you are satisfied with, the next question is how to classify all the corpus texts 
according to it? Because, usually, corpora will be too big to do this completely by hand, 
machine learning techniques need to be applied. A rough procedure of how this can be 
done is shown in Table 5.1. 

Here, too, it should be clear that slightly different decisions will yield different clas- 
sifications, but again, it cannot be said in general which of them is better. The best 
and almost the only way to deal with this is to provide some meta-metadata by docu- 
menting the procedure (step 8) and by attaching confidence estimations (the scores 
in step 7) to the classification. A typical pitfall of such corpus enrichments is that 
there might be some assumption built into the classifier that could later mistakenly 
be interpreted as corpus findings. For example, a corpus query could yield the result 
that the word ‘garden’ is very frequent in the domain ‘leisure’. This is no surprise if 
inside the domain classifier the key word ‘garden’ strongly supports the classification 
ofa text as ‘leisure’. The danger of such circular reasoning, or to put it more generally, 
the danger of investigating a tool (or the training data, etc. behind it) instead of the 
language is of course most imminent when the question to be investigated is closely 
related to the classification itself—for example: ‘What is the vocabulary typically used 
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Table 5.1 Recipe for an automatic domain classification 


Write a manual with instructions on how to apply the taxonomy. 

Draw as large as manageable a sample of texts of your corpus. 

Let at least two independent annotators assign scores for each text with respect to the categories. 

Compute the total score matrix for the text/category combinations based on the different ratings. 

Define a threshold score above which every text can be assigned to at most one domain. 

Use the texts that can be assigned to a domain as training and testing data for a text classification 

tool, as for example the Weka taol/kit (Witten et a/. 2011). 

7. If the trained classifier performs well enough on the manually classified data, apply it to the 
remaining corpus texts and record, for example, the two top rated categories together with their 
respective scores given by the classifier in the metadata of each text. 

8. Record the taxonomy and the procedure of the classification in the metadata of the corpus. 


OO PO RS 


in connection with leisure activities?’ In such cases, it is advisable to make use of the 
self-introspection facilities that the classifier toolkit should provide, to examine what 
premises are built into it. Note, however, that the danger of mistakenly researching 
the tool instead of the language without realizing it is greatest when the use of corpus 
enrichment is only the first in a long sequence of steps and thereby far away from cor- 
pus findings. 

A good approach again, in order to take into account the strong task-dependency of 
text classifications (in analogy to and matching with the virtual corpus approach), is to 
provide for the addition of further classifications carried out in the usage phase of the 
corpus. 


5.4.2 Part-of-speech Annotations 


The most widely used corpus enrichments below the text level, apart from text struc- 
ture annotations, are probably the annotation of token segmentations, lemmata, 
and part-of-speech categories. While a token segmentation is essential for the query 
software to build inverse indices that allow for efficient search, lemma information 
and part-of-speech annotations can enable the user to formulate corpus queries that 
abstract from concrete token strings. This is useful and necessary, whenever unde- 
termined representatives of open word classes are searched for. For example, a query 
for a sequence of three adjectives would be impossible to formulate without using 
part-of-speech categories, as the number of different adjectives is essentially unlim- 
ited, and their number in the corpus is potentially very large. Since manual part-of- 
speech annotation, at least for large corpora, is again too costly, the annotation is 
normally performed automatically or semi-automatically with the help of so-called 
Part-of-Speech taggers (POS taggers). For English, in particular, there are quite a lot 
of such tools and some of them come with parameter files for different languages. 
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The widely used TreeTagger (Schmid 1994), for example, supports more than ten lan- 
guages. POS taggers usually also annotate lemma information, finiteness on verbs, 
whether nouns are proper names, or the distinction between prepositions and post- 
positions, so that POS annotation typically means morphosyntactic annotation. The 
accuracy reported for POS taggers is usually between 93 and 97 per cent for tokens 
in texts that are comparable to their training data. While this seems high, note that 
with an average sentence length of fourteen words, this means that every second to 
third sentence will be tagged incorrectly. In addition, the expected error rate for, for 
example, texts randomly sampled from the web (cf. Giesbrecht and Evert 2009) or for 
text passages that involve linguistically or lexicographically ‘interesting’ phenomena 
(cf. Belica et al. 2011), will be considerably higher. The main problem connected to 
this are the so-called false negatives, also called type Il errors, that is, the text pas- 
sages that you will not find, and therefore remain unaware of, because of incorrect 
annotations. 

In addition, POS annotations, just like the domain classifications described in 
Section 5.4.1, are simply interpretations of the primary data and different annotators 
will have different opinions—starting with the tokenization to apply and the tag-set to 
use. A good approach to taking this into account for the construction of a corpus and 
also to tackling the problem of imperfect accuracy is not to rely on a single opinion but 
to apply several different POS taggers. By analysing the different concurrent interpre- 
tations, systematic errors and problem areas can possibly be detected and in the usage 
phase, type I or type II errors can be reduced by taking either the intersection or the 
union of hits based on the different annotation layers. 


5.5 ENCODING A CORPUS: FROM RAW 
DATA TO CORPORA 


5.5.1 Encoding Format 


Inorder to make a corpus useable, for example by query software, one has to encode it 
in a well-defined format. Also for the sake of interoperability it is strongly advised not 
to reinvent the wheel but to use an established format such as the Corpus Encoding 
Standard for XML (XCES) (Ide et al. 2000) or the format described by the guidelines of 
the Text Encoding Initiative (TEI Consortium 2009). XCES or rather its SGML-based 
precursor CES branched off from the TEI development in the mid-1990s with the aim 
of creating a simple standard format specifically for linguistic corpora. However, it is 
recommended to use the TEI guidelines, because since 2000 XCES is no longer being 
developed and because, with its version Ps, the TEI guidelines now allow a flexible 
adaptation to individual needs for constructing a new corpus. The advantages of TEI 
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are in particular: sophistication, expressive power, ease of document creation schemes, 
active development by a large community, and very detailed documentation. Because 
of its applicability to very different tasks and its expressive power it is, however, advis- 
able to limit the total TEI inventory of elements to the subset that is likely to be needed 
in order to facilitate a consistent encoding. With Roma,‘ the TEI Consortium offers a 
special tool to add and remove elements and to generate the corresponding custom- 
ized XML schemas. The possibility of such customizations, however, shows that two 
encodings of the same corpus data, both compliant with the TEI Guidelines, need not 
be and are actually quite unlikely to be identical. The trade-off between guaranteeable 
compatibility and interoperability on the one side and flexibility and expressive power 
on the other is, however, unavoidable. 


5.5.2 Conversion of Raw Data 


The raw data corpora feed on can come in a variety of formats depending on the 
sources to be used, as for example HTML, PDF, DTP, EPUB, DOC, pseudo-XML, 
or XML. In order to convert these formats and sub-formats—each of them poten- 
tially using different conventions—to the desired TEI target format, at least some 
text-technological know-how and some tools will be needed. A generally recom- 
mended approach is to start by first converting all these formats into a correspond- 
ing XML representation by using tools such as tidy, TagSoup, unoconv, pdftoxml, 
or the conversion facilities of OpenOffice.org or Adobe Acrobat. In order to con- 
vert these XML documents to the desired XML format (or intermediate represen- 
tations), the Extensible Stylesheet Language Transformations (XSLT) is the tool of 
choice. XSLT is designed for the conversion of XML documents and can be used as 
a declarative programming language. If the declarative properties of XSLT are used 
comprehensively, XSLT stylesheets are typically far superior to conventional proce- 
dural programs with respect to clarity, maintainability, and robustness to variations 
in the documents to be transformed. In the course of the development of conversion 
programs for mass data, where manual intervention is almost impossible and also 
not recommended in order to make the results reproducible in subsequent runs of 
the conversion process, the use of certain heuristics is normally unavoidable. They 
are needed to remove unwanted data, such as the so-called boilerplates of HTML 
pages (cf. Evert 2007) and particularly to reconstruct information that is encoded 
in so-called visual markup, which usually is easily interpretable for a human reader 
but notoriously ambiguous for automatic processing. For example, if a text passage 
in the raw data is marked as “Times New Roman, 11pt, bold’, this could mean that 
the passage contains a headline, or a name of a place, a name of a press agency or a 
quotation, or something unknown that has not been observed so far. Such (local) 


4 <http://www.tei-c.org/Roma/>. 
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ambiguities can to a certain degree be resolved by inspecting the context, but typi- 
cally not with 100 per cent accuracy. This holds particularly when the contexts are 
heterogeneous because of varying text sources, as is the case, for example, when texts 
are randomly sampled from the web. For these reasons, with an initial conversion of 
the raw data into the target format, the work is usually not yet complete. In subse- 
quent cycles, the resulting corpus data need to be repeatedly investigated for errors 
or undesirable properties and the conversion must be corrected accordingly. In the 
first cycles, the errors will typically be obvious or brought to light by the validation of 
the XML document. After that, especially with larger amounts of data that cannot be 
browsed exhaustively, typical errors and those that are expected due to the heuristics 
that were used need to be sought actively. Since it is unknown to some extent what 
errors to expect and what is a conversion error in the first place, it is also advisable to 
determine abnormalities based on the comparisons of texts among each other and 
by comparing the generated corpus with other corpora. Based on the comparison of 
texts, such problems as unsolicited text duplications, that may lead to sample biases, 
can be detected. By comparison with other corpora, frequency anomalies can also be 
detected at the level of characters or tokens that indicate either conversion errors or 
special characteristics of the generated corpus. 


5.6 LICENSING TEXTS FOR CORPORA 


Linguistics is in general confronted with the problem that its primary research data, 
unlike that of many other disciplines, is typically subject to third-parties’ rights—no 
matter if something is published on paper or on the World Wide Web. In the case of 
texts, it is the intellectual property rights of authors and publishers that are particu- 
larly affected; in the case of video and audio data, it is the personal rights of individuals 
included, such as their right to informational self determination. 

Since the problem can ultimately be reduced to a conflict of fundamental rights, 
between academic freedom on the one side and the right to property on the other, it 
is unlikely to disappear even with more modern or more research-friendly copyright 
legislation. To use texts in corpora, it will therefore be necessary to find individual 
compromise solutions that will vary depending on the type of data and its intended 
use. When building a corpus, it is important to seek such a solution at an early stage, 
since it can by no means be ruled out that ultimately no such compromise solution may 
be found. The consequence, which can often be observed in academic corpus building 
projects, is that a corpus that was built with a ttemendous amount of effort is eventually 
worthless, because no one can use it. 

‘There are essentially four ways to establish a corpus in a lawful fashion: (1) only 
use texts that are generally not protected by copyright, such as official texts (legisla- 
tive texts, parliament protocols, international conventions, etc.); (2) use older texts, 
where the copyrights have expired (which is usually the case seventy—and sometimes 
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Table 5.2 Checklist for licence agreements 


Genera! purpose of the agreement (helps to keep the licence short and not too detailed). 
Subject of the licence [it is advisable to keep this generic and open to later additions). 

The place where the data may be stored (is off-site long-term preservation allowed?). 

The scope of transferred rights (very important under some jurisdictions). 

Reference to an end-user licence agreement (EULA). 

Maximum amount of text visible to the end-user. 

Liability regulations. 

Regulations concerning technical measures to prevent abuse. 

Termination or revocability of the agreement (revocability can reduce the reluctance to sign the 
agreement very effectively, while usually there will be no need to do so). 
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fifty—years after the author has died);> (3) compile the corpus from units that do not 
achieve a sufficient level of originality to be protected by copyright law—depending on 
the local copyright legislation or adjudication that might be the case with individual, 
randomly shuffled sentences.° (4) The safest and most common way is to use licences, 
which grant the rights necessary for the intended use of the corpus. At best, the texts 
will already be published under sufficiently permissive licences, such as possibly the 
Gnu Free Documentation Licence (GFDL), or one of several Creative Commons (CC) 
licences. Typically, however, licence agreements with the respective rights holders will 
have to be concluded (see Table 5.2 for terms that typically should be included in such 
agreements). 

Publishers and other rights holders are usually very cooperative when it comes to 
granting simple, that is non-transferable, rights of use to a narrowly defined group 
of people for purely academic, non-commercial purposes. If direct or indirect com- 
mercial applications are intended, it is not unlikely that licence fees will be demanded. 
What, however, counts as commercial and what does not, is a matter of definition. It is 
advisable to treat at least the usual research publications as non-commercial applica- 
tions. The kind of licence fee is also typically a matter of negotiation. Non-monetary 
counter-values that publishing companies could be interested in are, for example, 
advertisements and the permission to use the future corpus or dictionary product. The 
fact that nowadays at least large publishing companies have business models for the 
electronic exploitation of their goods can, on the one hand, speed up the contract nego- 
tiations. On the other hand, however, the departments that deal with electronic exploi- 
tation often still have to prove that they pay off and—like copyright legislation—the 


5 It is important to note, however, that even if the content itselfis not copyright protected according 
to (1) and (2), third parties may have acquired intellectual property rights by, for example, processing, 
converting, or editing the data. 

§ In many jurisdictions, however, the so called moral rights protect the integrity of a work. Moral 
rights are non-economic rights that belong to the author even if he transferred his copyright to 
another person. 
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business models are usually not tailored to applications that focus on language as part 
of mass research data, rather than, for example, information. 

Difficulties can also arise when the rights of use need to be fully or partially trans- 
ferable to third parties (i.e. if you want to make the corpus available to others) or when 
the rights holder needs to provide electronic material that is not yet published as such. 
The reason is that publishers fear economic damage when electronic copies of their 
goods are brought into circulation—especially as the proliferation of electronic copies 
is difficult to control because, unlike for e-book formats, for open, unencrypted text 
documents, as required for corpus compilation, digital rights management and copy 
protection are not applicable. The difficulties to be expected then are high licence fees 
or restrictions to only few texts, or snippets of text, or texts without metadata. In the 
worst case, no licence agreement can be concluded. Typically, compromises will be 
necessary, and, given a fixed budget, it is hardly possible to optimize all factors (lots 
of texts, valuable texts, low licence-fees, rich bibliographic data, unrestricted transfer- 
ability). Even if this is successful for a few licensors, this will not help very much if the 
corpus also contains texts of licensors that grant less favourable terms, because usually 
a common denominator has to be found for the licence under which the whole corpus 
will be released and made available. 


5.7 CONCLUSIONS AND OUTLOOK 


It was shown how the construction or extension of a corpus can be optimized with 
respect to relevant criteria by adopting an iterative approach that allows for the post- 
ponement or refinement of decisions to later stages or cycles. The selected optimiza- 
tion criteria were particularly costs, the average usefulness of a corpus for different 
purposes, its flexible adaptability for particular purposes, its legality, and its scientific 
adequacy concerning representativeness, with respect to varying domains and pur- 
poses, as well as concerning factors such as traceability and replicability. Furthermore, 
it was shown that, today, where costs for acquisition, up-conversion and licensing vary 
greatly with different kinds of texts, the primordial sample design approach might be 
more efficient than a traditional balanced design approach—even when taking into 
account the greater demands on the query software. In the academic sector, current 
developments in research infrastructures such as CLARIN will allow for even greater 
efficiency by making it possible to generalize the primordial sample approach to vir- 
tual corpora distributed over different centres, thereby maximizing the usefulness of 
the individual corpora, while taking into account unavoidable licence restrictions that 
do not allow them to be moved or copied. 
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6.1 INTRODUCTION 


THE last few decades have been very eventful for lexicography. The arrival of computers 
has enabled the development of new resources of language data, electronic databases 
called corpora. Availability of these new resources gradually revolutionized not only 
lexicography, but other fields such as linguistics and language teaching. Analyses of 
corpora lead to new theories about language and its use, and this has had an effect on 
dictionary content, and its presentation. 

Lately, corpus-based dictionaries have become the norm in lexicography (Kilgarriff 
2000). Lexicographers use corpora to obtain information on the frequency of words 
and phrases, to discover their meanings, and to find examples of authentic usage. Due 
to technological progress, lexicographers are now able to interrogate corpora of more 
than a billion words in size, using sophisticated corpus tools. The introduction of cor- 
pora and corpus tools has had a significant impact on the work of lexicographers. They 
have had to add new skills to their existing repertoire, the skills of using corpus tools, in 
order to be able to analyse the corpus data and obtain the relevant information about a 
word’s usage relatively quickly. 

This chapter presents the functionality of corpus tools from the lexicographer’s 
perspective, that is, focusing on the functions that are most often used in dictionary- 
making. First, the procedures and decisions before the analysis are discussed, as they 
influence how the data are or can be analysed. Then, an overview of different functions 
of corpus tools is given, from basic to advanced functions, as well as functions that have 
been devised specifically for lexicographical purposes. Recent developments such as 
the automation of certain processes are also presented. Finally, the conclusion offers a 
few caveats about the use of corpus tools, and provides some thoughts on future devel- 
opments in corpus tools and their potential impact on the role of lexicographers in the 
dictionary-making process. 
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6.2.1 Preparing a Corpus for Analysis 


The processes of corpus compilation and annotation (see Kupietz, this volume), deter- 
mine the number of different functions of corpus tools that can be used for inter- 
rogating the corpus. The time and effort spent on these two processes must not be 
underestimated, as the availability of more functions for the lexicographers results in 
less time-consuming and better, or more comprehensive, data analysis. 

When compiling a corpus, itis important to record as much information on the source 
texts as possible, for instance the year of publication, source of publication, author, text 
type, domain, mode (written or spoken), etc. This information is called metadata and is 
recorded in the header part of each document in the corpus. The availability of such infor- 
mation enables the lexicographers to quickly spot any patterns in a word’s distribution in 
the corpus and/or limit searches to different parts of the corpus. For example, informa- 
tion on the subject domain of the text provides a basis for assigning domain labels. In fact, 
types of information such as text type and domain of the text should be recorded accord- 
ing to a classification that considers the labelling system to be used in the dictionary. The 
information on the domain of the text can also be important in the selection of dictionary 
examples; for instance, in the Louvain English for Academic Purposes Dictionary (LEAD) 
examples from many different domains were saved for each headword, as the dictionary 
offers a customizable feature that uses the information on the user’s domain of study, and 
shows only examples from that domain (see Granger and Paquot 2010a). 

Equally important for corpus interrogation is corpus annotation, which involves 
the processes of tokenization, lemmatization, and part-of-speech tagging (see Kupietz, 
this volume). Tokenization ensures that each token, normally a word, is surrounded by 
spaces so that it can be identified during searches. Lemmatization assigns the informa- 
tion on Jemma, or the base form, to each word, and this allows the users to search, filter, 
list, etc. the data by lemma, rather than having to type in all the forms of the word. 
Part-of-speech tagging, namely automatically assigning the part-of-speech informa- 
tion to a word, is useful! in languages where a word form may belong to more than one 
lemma; for example, in English, cold may be an adjective, a noun, or an adverb, and it is 
useful for a lexicographer to be able to limit the search to an individual part of speech. 


6.2.2 Corpus Tool 


In addition to a well-prepared corpus, lexicographers need access to a good corpus tool. 
A corpus can be provided with extensive metadata and annotation, but that is of little 
use if it is not available to lexicographers for interrogation in a good corpus tool. It is 
noteworthy that lexicographers are probably the most demanding corpus users—they 
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require and regularly use the highest number of features in corpus tools. In fact, several 
features of corpus tools were originally designed especially for lexicographical pur- 
poses, only to be later found useful by linguists, teachers, and others. 

There are many corpus tools available, and they can be grouped into standalone, 
or computer-based tools, and online tools. Standalone tools are usually corpus-inde- 
pendent, that is, they can be used for analysis of any corpus stored on the user’s com- 
puter. Examples include WordSmith Tools (Scott 2008) and MonoConc Pro (Barlow 
2002). The limitations of standalone tools are mainly in the amount of corpus data they 
can process, which significantly reduces their usefulness for lexicographers, consid- 
ering the size of modern reference corpora. Online tools, on the other hand, can be 
either corpus-bound—and so limited to a particular corpus or corpora—or corpus- 
independent. Examples of corpus-bound online tools include KorpusDK (for access- 
ing a reference corpus of Danish), Corpus de Referencia del Espatiol Actual or CREA 
(Spanish reference corpus), and Mark Davis’s tools at <http://corpus.byu.edu>. The 
main advantages of online corpus tools are that the users can access corpora from any 
computer, and that they can provide quick searching of very large corpora. Moreover, 
because online tools can process more data, their developers can provide additional 
advanced features for corpus analysis. Consequently, online corpus tools are nowadays 
used in the majority of dictionary projects. 

In recent years, the Sketch Engine, an online tool developed by Adam Kilgarriff and 
others (see Kilgarriff et al. 2004), has become a leading corpus tool for lexicography, 
and most of the advanced functions presented in this chapter are limited to this tool. It 
works with corpora of billions of words, and, most importantly, its authors are making 
sure that features are regularly added according to lexicographers’ needs. The Sketch 
Engine is currently used in lexicographical projects involving languages such as Czech, 
Dutch, English, Estonian, French, Portuguese, and Slovene, to name but a few. 

One of the aspects of corpus tools that is becoming increasingly important is user- 
friendliness. Modern lexicographers have to be very versatile—in addition to their lan- 
guage analysis skills, they need to possess a fair amount of computational skills to be 
able to effectively interrogate a corpus with a corpus tool (e.g. knowing corpus query 
language and corpus annotation), not to mention the ability to use a dictionary writing 
system and other project-specific software. Thus, user-friendly features of corpus tools 
such as an easy-to-understand interface, a localized version of a corpus tool, custom- 
izable elements, on-screen hints and tips, and quick access to documentation and/or 
technical support, go a long way to making a lexicographer’s job easier. 


6.2.3 Role of a Corpus in a Dictionary Project 


‘The extent to which a corpus tool will be used in a dictionary project depends on the 
role acorpus is expected to play in that project. The two main approaches of using cor- 
pus data, corpus-driven and corpus-based, are often juxtaposed by corpus linguists. 
According to Tognini-Bonelli (2001), in a corpus-driven approach any statements 
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‘are fully consistent with, and reflect directly, the evidence provided by the corpus’ 
(Tognini-Bonelli 2001: 84), whereas in a corpus-based approach, a corpus is used to 
‘expound on, or exemplify, existing theories, that is theories which were not necessarily 
derived with initial reference to a corpus’ ( Tognini-Bonelli 2001: 66). 

In lexicographical practice, however, the differences between the two approaches 
become fuzzy, and it becomes more of a question whether lexicographers want to base 
their dictionary solely, predominantly, or only partly on corpus data. Therefore, in a 
corpus-driven dictionary, ‘the new entries, sense divisions, and definitions are fully 
consistent with, and reflect directly, the evidence of the corpus; examples are used ver- 
batim; recurrent patterns form the basis for lexicographical categories; and the absence 
of an entry, or a pattern in an entry, isa meaningful lexical statement’ (Krishnamurthy 
2008: 240). The corpus-driven approach was first used in the Cobuild project that pro- 
duced the Collins Cobuild English Language Dictionary (1987); see Sinclair (1987b) for a 
detailed account of the project. 

Much more widespread in lexicography is the corpus-based approach where a 
corpus still plays a major role, but some lexicographical decisions are not based on 
corpus data, either due to shortcomings of the corpus or due to practical considera- 
tions. Lexicographers may want to include in the dictionary all the words that belong 
to a particular lexical set’ (e.g. wild animals) even though some words may be rare or 
non-existent in the corpus. Ordering of senses may not always follow the corpus fre- 
quency. Corpus examples may be modified to make them more understandable and 
readable for users. Many types of dictionary use a corpus-based approach, with EFL 
dictionaries being at the forefront. 

Some lexicographical projects use corpora in a more limited way, for example 
for a specific dictionary feature, which is often one of the upgrades from the previ- 
ous version of the dictionary. Such an approach could be called corpus-informed, 
although it should be pointed out that in corpus linguistics (e.g. Carter and McCarthy 
2006: 11-12) and teaching (e.g. Carter et al. 2011: 94), as well as in dictionary market- 
ing, corpus-informed often means almost the same thing as corpus-based. Examples 
of corpus-informed lexicographical practice include adding (modified) corpus sen- 
tences to existing entries, marking words that belong to a particular set (e.g. Longman 
Exams Dictionary (2007) labels words from the Academic Word List), or simply con- 
firming that the existing senses, phrases, and constructions are attested in the corpus. 


6.3 ANALYSIS 


Features of corpus tools can be divided into basic and advanced features. Basic features 
use basic-level computer processing, such as counting, ordering, selecting randomly, 


! A lexical set is ‘a group of words that share a common element of meaning’ (Atkins and Rundell 
2008; 123). 
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etc., and leave the data analysis entirely to lexicographers. Advanced features on the 
other hand (attempt to) conduct a partial automatic summary of the data using statisti- 
cal methods, and leave it to lexicographers to inspect and validate the results, as well as 
to conduct more detailed analysis. A good lexicographer will know how to use all the 
functions offered by the corpus tool, and when to use them, and will be aware of their 
benefits and shortcomings. 


6.3.1 Basic Features 


63.1.1 Concordance 


The main basic feature of corpus tools, which represents the essence of corpus inter- 
rogation for lexicographers, is concordance. The concordance isa listing of the occur- 
rences of a searched item, with each occurrence shown with a surrounding context. 
There are two different ways of displaying the concordance lines: as sentences, or in the 
KWIC (Key Word in Context) format, with the lines centred according to the searched 
item (see Figure 6.1). The main function of the concordance is to show the word in con- 
text and give lexicographers an idea about its meaning(s). Of course, to identify all 
the different meanings of the word, a lexicographer often has to examine a substan- 
tial number of concordance lines. As Atkins and Rundell (2008: 296) point out, the 
lexicographer is looking for indicators or clues, both ‘internal’ (lexico-grammatical 


Corpus: ukWact 

Hits: 3383 (2.2 per midion) sampied to 20 

1547 either butkis a system of ethics, or consigns iliustricus actions to Immortallty? Literary fame, 

393 day af inactivity, | went to the famed, illustrious and frankly overrated Ajanta caves. Maybe 
706 although he has now hung up the boots on an illustrious career, he remains an integral part of 

1035 af Nicholas Brothers fame who, during his Wiustrious career, topped countless bills and appeared 
1520 still rated as a landmark in both of their illustrious careers, A measure of Derrick’s standing 
106 Birmingham Symphony Orchestra has atways had an illustrious career. tn fact, the first concert was 

2918 viewing: so I think a jook back at Kurosawa's ilfustriaus career is entirely justified here. The 

423 just visible. Lenton has hed a long and itlustricus connection with the lace industry but gradually 
971 directive and advisory roles respectively, the iflustricus duo Steven Issertis and Andras Schiff. 

1429 Junior Tennis Championships are fotowing in illustrious footsteps. Britain's Wimbledon star, Andrew 
10839 at 3 Rodney Place, Clifton. Davy had many illustriaus friends including the poets Southey and 
1489 staggering generosity from his gotd-plated and illustrious guests, as weil as from running the Aga 
7474 about being the 200th preacher in a long and illustrious historical heritage. As | look back at 

342 early season financial problems, matched the illustrious league leaders for much af this high scaring 
BO65 Superb links by no means ectipsed by its illustrieus neighbour Royal Birkdale. indeed, many 
8507 Thus, all states are founded through the fitustrious power of great men. And Hegel adds, not 
4725 standing on ”...the powerful shoulders of our illustrious predecessors” we are able both to advance 
1933 astrologers in 1963 and has subsequently gained an iflustricus reputation as an original thinker and stimulating 
8002 in the fray: For tounge cannot frame the isustrious story Of the daring of martyrs on that 
1460 that only could have ever emanated from the illustrious vaults of Bray Studios. | hardly expect 


FIGURE 6.1 A sample of twenty random concordance lines for illustrious in the ukWaC corpus 
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environment) and ‘external’ (features of the text such as domain, dialect, and setting) 
that are typical of each meaning, and at the same time distinguish one meaning from 
other meanings of the word. 

To regard an indicator as typical of a meaning, there has to be a pattern of its use, 
in other words, it has to occur repeatedly (in several concordance lines). The sort- 
ing function helps lexicographers flesh out the patterns by grouping together con- 
cordance lines with the same pattern. The most commonly used types of sorting are 
sorting by the searched word, which groups concordance lines of individual word 
forms, and sorting by the first word to the left or to the right of the searched word. 
The usefulness of each type of sorting depends on the category, and the language, of 
the searched word, for example in English, sorting by first word to the left is typically 
more useful for nouns, whereas sorting by the first word to the right is more useful 
for adjectives (e.g. see Figure 6.1). 

Once the lexicographers identify a certain pattern, they want to focus on that par- 
ticular pattern, and after recording and describing the pattern, they want to focus on 
the remaining data. In the Sketch Engine, this is facilitated by the filtering function, 
where the lexicographer can set the positive filter (which shows the data matching 
described criteria) or the negative filter (which excludes the data matching described 
criteria), set the search span, and set the type of search (e.g. search according to the 
lemma or al! word forms, or limit the search according to a particular word form). 

The lexicographer can get a quick idea of various meanings of the word by inspecting 
a sample of concordance lines. However, the sample must not be simply the first X con- 
cordance lines, as the concordance output lists occurrences of the searched word in the 
order they are found in the corpus, meaning that the concordance lines at the begin- 
ning of the concordance output could in some cases come from a single document, 
and may contain many instances of the same pattern or phrase. For example, if there 
is a document on global warming at the beginning of the corpus, the first concordance 
lines for global are likely to include many examples of global warming and very few or 
no other meanings or patterns of global. The sampling function that provides a random 
sample of concordance lines helps a lexicographer reduce the possibility of getting a 
distorted picture of the word’s usage. 

A mandatory part of every corpus tool is the function that enables the saving of 
search results. This allows lexicographers to save the identified information for later, or 
for further analysis with software other than a corpus tool. In addition, there are also 
more advanced options for saving the corpus data, made especially for saving data into 
dictionary databases, and these options are discussed in Section 63.4. 


63.1.2 Statistics-based Features 


Frequency plays a vital role in determining whether something can be regarded as 
a pattern, and this section presents different features that lexicographers can use to 
obtain information on frequency and other statistics on word forms, lemmas, etc. 

A wordlist is a list of words (it can be of lemmas, word forms, etc.) and their fre- 
quencies in a given corpus. Wordlists are often consulted at the beginning of a 
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Table 6.1 First 100 keywords of the computing subcorpus in the OFC 


Apple, user, Microsoft, software, Windows, Linux, server, internet, Intel, Google, Web, computer, 
technology, PC, network, file, device, online, processor, iPhone, data, game, Mac, storage, application, 
customer, download, web, hardware, product, version, IBM, chip, system, mobile, access, company, 
content, graphics, e-mail, developer, digital, video, security, app, I, code, site, install, update, phone, 
AMD, card, desktop, OS, interface, tool, vendor, search, wireless, feature, platform, CPU, disc, browser, 
copyright, information, database, Sony, machine, broadband, spam, operating, mode, engine, XP, 
patent, iPod, HP, X, memory, drive, design, service, consumer, upgrade, USB, website, use, available, 
page, com, button, solution, BT, PCs, IP, format, screen, backup 


dictionary-making process, when building a provisional headword list,” since the 
word’s frequency or simply its presence in the corpus can determine its inclusion in 
the dictionary. Frequency becomes even more relevant when a dictionary covers only 
part of the language; for example, the makers of a dictionary for advanced learners 
will include more frequent words in their dictionary and omit rare words, as they are 
less likely to be consulted by users. 

Wordlists are also consulted when frequency information is made explicit in the dic- 
tionary, for example when the lexicographers want to indicate which words belong to 
top 1,000, 2,000, etc. most frequent words in the corpus, the approach often used in 
advanced learners’ dictionaries. 

Another useful feature that is based on wordlists is Keywords, and is found in tools 
such as WordSmith and Sketch Engine.? A keyword is ‘a word which occurs with unu- 
sual frequency in a given text’ (Scott 1997: 236), compared to its frequency in a refer- 
ence corpus. Keyword extraction can thus be used by lexicographers to identify words 
and multi-word units that are more common to a specific domain, register, or regional 
subcorpus, and are thus candidates for a label. Table 6.1 for example shows the first 100 
keywords (lemmas) of the Computing subcorpus in the Oxford English Corpus (OEC), 
compared to the entire OEC corpus—many words listed are good candidates for the 
label Computing, including (in at least one of their meanings) software, server, network, 
file, application, and download. 


63.13 Collocation 


Lexicographers are not only interested in words immediately preceding or following 
the word under analysis, but also in words in the vicinity of it, especially words occur- 
ring noticeably more frequently than others. Words that are found around the ana- 
lysed word (normally within a window of five words to the left and five words to the 


2 As Rundell and Kilgarriff (2011) rightly point out, headword lists evolve during the dictionary 
project, and are only complete at its conclusion. 

3 WordSmith Tools and the Sketch Engine use different formulas to calculate keywords. The 
formula in WordSmith Tools is based on the log likelihood statistic (or chi-square, depending on the 
user’s selection), whereas the Sketch Engine formula is less complex and only compares normalized 
frequencies of the words. 
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Table 6.2 First 15 collocates of brilliant for five different statistics 


i I 


T-score MI Mi3 log likelinood logDice 

A eRe a ae an ee Sc ee Re 0 
Be Cruises. be be absolutely 

‘ stoppingly a a brilliant 

A weatherpersons : : sunshine 

é In't F ; blue 

the inter-changing and and idea 

and pulled-off the the color 

of ripasso of of performance 
it Beatles-inspired it it fucking 
passage-work r a career 

in wizardy have have utterly 

have Coomassie in in technically 
to inside-forward absolutely with sky 

with Huina to he flash 

he incandescently with but save 

that Reser he to piece 


right) with unusual frequency are called collocates, and every corpus tools has a fea- 
ture, normally called Collocation, that produces list of the collocates of a given word. 

There are different measures used for calculating collocation; the most widely used 
are T-score, MI score (Church and Hanks 1990), MI3 score (Oakes 1998), log likelihood, 
and more recently logDice (Rychly 2008).* As demonstrated in Table 6.2, showing the 
first fifteen collocation candidates for the adjective brilliant in the OEC, each measure 
has a preference for certain words. For example, T-score, MI3, and log likelihood favour 
(very) frequent words, whereas MI offers rare words which are found predominantly, if 
not solely, in the vicinity of the analysed word. LogDice emphasizes lexical collocation 
candidates, which often give the lexicographer a better idea of the word’s usage and 
meaning (e.g. see a list of collocates for the adjective brilliant in Table 6.2). 


6.3.2, Advanced Features 


Over the years, corpora have become larger and larger. In the early 1980s, in the 
beginnings of corpus lexicography, the lexicographers working on the Cobuild pro- 
ject had a 7-million-word corpus at their disposal (Sinclair 1987b). Then, in the 1990s, 
100-million-word corpora, such as the British National Corpus (BNC), became the 
norm. Nowadays, billion-word corpora are fairly common, not only for big languages 
such as English (e.g. OEC, ukWac) or German (the Deutsches Referenzkorpus), but also 


4 Bor an overview of different collocation measures, see also Stubbs (1995), Evert and Krenn (2001), 
and Manning and Schiitze (2003). 
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Table 6.3 Frequencies of selected words in the BNC, ukWaC, and 


OEC corpora 

BNC ukWaCt O£C 
efficient (adjective) 3953 63,991 61,748 
blessing (noun) 971 17,568 27,809 
toke (verb) 173,412 2,178,773 3,610,526 
awkwardly (adverb) 361 1,310 4,493 
memory (noun) 9,932 137,757 21,2945 


for small languages such as Slovene (Gigafida), Czech (Czech National Corpus), and 
Polish (Polish National Corpus). (Compare Kupietz, this volume.) 

Arguably, larger corpora provide more information about the language and the 
behaviour of words, but for lexicographers this means (much) more data to analyse. 
This is exemplified in Table 6.3, which shows the frequencies of a selection of words 
in the three corpora: BNC (100 million words), ukWaC (13 billion words), and OEC 
(2 billion words). With such amounts of data to analyse, summarizing features such as 
collocation become even more valuable to lexicographers. To help lexicographers even 
further, developers of corpus tools have devised functions that summarize informa- 
tion about a word’s collocational as well as grammatical behaviour. Such functions are 
Word Sketch in the Sketch Engine tool (Kilgarriff et al. 2004), which is presented here, 
and DeepDict Lexifier (Bick 2009).° 

In Word Sketch, collocates are grouped according to grammatical relation (see 
Figure 6.2), thus enabling lexicographers to spot patterns more quickly. Moreover, 
because each meaning of a word tends to prefer particular collocates (Hoey 2005), the 
collocates in the word sketches can give lexicographers a good idea of different mean- 
ings of the word. For example, in Figure 6.2, the collocates in the grammatical relation 
V* obj N, containing objects of the verb squander, give indications of two different pat- 
terns/meanings of the verb—one related to squandering money, inheritance, wealth, 
and the other one related to squandering an opportunity, chance, advantage, etc. Using 
the clustering feature in Word Sketch, collocates can also be grouped automatically, 
according to the similarity score (see also the Thesaurus function). Clustering collo- 
cates in the word sketch of squander (see Figure 6.3), we can observe that some the col- 
locates (e.g. see collocates under chance, sum, and money) are grouped according to the 
meanings we identified above. 

‘The importance of word sketches for lexicographers has been stressed by Atkins and 
Rundell (2008) who argue for the use of word sketches as a point of departure when 
devising dictionary entries. And in their recent article, Rundell and Kilgarriff (2011) 


5 Another similar function, called Wortprofil, is available on the website of the Digitales 
Worterbuch der Deutchen Sprache (DWDS) project. 
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69 6 9 
inheritance 49 8 8 
hatful 6 5. Z Z 
chance 592 5,22 || apathy 5 5 
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ead 432 4,01 || Government V7 6 
possession 51 3.93 || Washington 8 Z 
talent 89 3.89 24 
resource 212 3.85 7 
wealth 59 3.82 6 
credibility 25 3.76 12 
windfall 6 3.75 7 
proceeds 12 3.64 7 
fegacy 23 3.43 61 
prestige 7 3.34 9 
riches 5 3.29 9 
ten 40 3.27 H 
money 322 3.8 & 
advantage 7Q 24 
potential 54 8 
sum 26 Vv 

id 8 


FIGURE 6.2 Word Sketch for the verb squander 


report on the increasing use of word sketches not only in the UK, but also in other 
countries in Europe (e.g. Czech Republic, the Netherlands, Slovenia), Asia (e.g. China, 
Japan), and the United States. 

The recent developments of the Word Sketch function include Multiword sketches 
(Kovai 2012), that is, word sketches for a sequence of words. This feature, available only 
in the beta version of the Sketch Engine at the time of writing, can be accessed by either 
typing a phrase in a search window, or by selecting a link before a collocate in the word 
sketch. The benefit of having such a feature is that the lexicographers can now ana- 
lyse the automatic summaries of the collocational and grammatical behaviour of com- 
pounds, phrases, and other multi-word units. 

Further exploitation of word sketch data is made by the Sketch Difference function 
(Kilgarriffet al. 2004, see also Kilgarriffand Kosem 2012), which can be used to compare 
word sketches of two different lemmas in the corpus, word sketches of the same lemma 
in different subcorpora, or even word sketches of individual word forms. The Sketch 
Difference output is divided into three segments: common collocates and patterns (pat- 
terns common to both lemmas or word forms), and respective segments for collocates 
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FIGURE 6.3 Clustered view in the word sketch for squander (relation V* obj N) 


and patterns which are specific to one or the other lemma/word form. The compari- 
son of word sketches of two different lemmas is particularly useful when describing the 
differences between synonymous words. Similarly, the sketch difference of the same 
lemma in different subcorpora is useful for identifying those patterns that are common 
to multiple domains, and those that are specific to a particular domain, region, mode, 
etc. Finally, the sketch difference between different word forms is helpful when there is 
evidence of different meanings being more or less connected to a particular word form, 
for example with the singular and plural form of a noun, as in the sketch difference for 
singular and plural form ofthe noun authority, specifically for the grammatical relation 
‘V obj N’ (verbs that have authority or authorities as an object), shown in Figure 6.4.° 
A lexicographer can observe that one normally stamps, asserts, delegates, etc. authority, 


6 In Sketch Difference, shades of red and green colour (replaced in Figure 6.4 by patterns with 
horizontal/vertical line patterns, and shades of grey respectively) are used to show the collocate’s 
strength of relation with one or the other lemma/word form. 
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FIGURE 6.4 Grammatical relation in the Sketch difference for authority and authorities 


and eludes, pressures, petitions, etc. authorities; there are several verbs that are shared 
by both forms, but alert, prompt, convince, etc. are much more commonly used with 
authorities, and question, undermine, and exercise, etc. with authority. 

The Thesaurus function, available in the Sketch Engine, provides lexicographers 
with a list of synonym candidates, or ‘nearest neighbours’ (Kilgarriff et al. 2004: 113), 
for the word. The synonym candidates are listed according to the similarity score, 
which is based on the word sketch information of the two words under comparison. 
So, the higher the similarity score of the synonym candidate, the more collocates (in 
the same grammatical relations) it shares with the word. Specific differences between 
any two words can be examined by using the Sketch Difference function, which can be 
accessed by clicking on any of the words in the Thesaurus list. 

Table 6.4 shows the first fifteen items on the thesaurus list for the verb argue in the BNC. 
There are several genuine (near)-synonym candidates, for example claim, suggest, state (for 
the meaning: ‘to give reasons that something is true’), and even some potential antonyms 
such as agree. It is clear that the Thesaurus function, combined with Sketch Difference, is 
valuable to lexicographers making thesauruses; testifying to this is the Oxford Learner’s 
Thesaurus—a dictionary of synonyms project, in the production of which both Thesaurus 
and Sketch Difference functions have been extensively used (Lea 2008). 
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Table 6.4 First fifteen lemmas on the Thesaurus 
list for argue 


Lemma Score Freq 
claim 0.255 18,672 
suggest 0.254 28,246 
seek 0.228 16,605 
try 0.218 52,914 
believe 0.217 33,674 
decide 0.217 23,825 
wish 0.216 16,391 
state 0.214 10,249 
fai! 0.213 15,878 
attempt 0.209 8,051 
refuse 0.209 10,339 
agree 0.207 22,887 
point 0.207 13,797 
insist 0.204 6,515 


express 0.204 11,959 


6.4 AUTOMATING PARTS 
OF LEXICOGRAPHICAL WORK 


The functions of corpus tools described in Section 6.3 go a long way towards helping 
lexicographers find, in the large amounts of data available, the information about a 
word that is relevant for the dictionary user. However, these functions still require a 
great deal of lexicographical input, such as identifying any patterns in usage and select- 
ing good examples for the dictionary. Recent developments in corpus lexicography 
have focused on extracting such information from the corpus (semi)-automatically. 
Two main examples of the application of such functionality are automatic extraction of 
label information, and the identification of good dictionary examples. 


6.4.1 Find X: Automatic Extraction of Candidates 
for a Label 


Labels in a dictionary are used to alert the user that a word’s usage deviates from the 
norm. Labels range from grammatical (e.g. usually plural, often passive), domain 
(Biology, Linguistics), and register (informal), to labels denoting regional variety 
(American English). The information that lexicographers need when deciding when 
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Table 6.5 Verbs that occur predominantly 
in the passive in the BNC 


Ratio Verb Freq 
75.1 base 19,201 
74.4 Station 557 
73.0 destine 771 
70.3 poise 640 
69.6 doom 520 
67.2 situate 2,025 
66.9 schedule 1,602 
66.5 associate 8,094 
66.4 entitle 2,669 
65.4 embed 688 
65.2 couple 1,421 
59.3 jail 960 
58.2 deem 1,626 
57.2 arm 1,195 
56.9 design 41,662 
56.2 clothe 749 
55.9 flank 551 
55.8 confine 2,663 
55.5 dedicate 1,291 
544 compose 2,391 


54.3 convict 1,298 


and which label to use can be obtained by using different features in corpus tools; for 
example, the frequency distribution of different word forms can be used to determine 
whether any particular form dominates in a word’s usage. Similarly, the analysis of 
the frequency distribution of a word across subcorpora can help to identify the need 
for domain, register, or regional variety labels. 

This method of assigning labels entails checking the frequency distribution infor- 
mation for each word, which has to be done by both the lexicographers and the edi- 
tors when finalizing the entries. This presupposes a percentage limit that has to be 
exceeded for a label to be used; for example, if we want to use a label usually plural 
with a noun, we need to determine the percentage of plural uses of the noun that 
needs to be exceeded, and the percentage of plural uses when lexicographers should 
consider using the plural form as a headword. Because these limits can change dur- 
ing the making of a dictionary, it is useful to be able to produce a list of candidate 
words for a particular label at any time. This is the purpose of the Find X function 
(Kilgarriff and Rychly 2008) found in the Sketch Engine, where such criteria are 
described and a search is conducted for all the lemmas in the corpus, returning a 
list of candidates meeting the criteria. For example, Table 6.5 shows a list of verbs 
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(minimum frequency = 500) that were found in the passive form in over 54 per cent 
of occurrences in the BNC. 


6.4.2. Good Dictionary EXamples (GDEX) 


Examples are an extremely important part of a dictionary entry, because they show 
how a word is actually used, thus returning the word to its natural environment in con- 
text after decontextualizing it by listing it in isolation. The main function of examples 
is to complement the definition; sometimes, the definition is hard to understand with- 
out reading the examples (Atkins and Rundell 2008: 454). Examples can also be of great 
help for navigating through longer entries, where users can ‘identify the particular 
sense they are seeking by finding examples that are similar to the one they need or have 
in front of them’ (Fox 1987: 137). 

Now that dictionaries are available in different electronic formats which can store 
large amounts of data, users can be provided with even more examples. However, good 
dictionary examples are notoriously difficult to find, mainly on account of many dif- 
ferent criteria they need to meet, such as typicality and naturalness, informativeness, 
appropriate length, and, in most cases, full-sentence form. Of course, lexicographers 
can decide to modify ‘slightly lacking’ corpus sentences, most often in terms of their 
length and complexity (Krishnamurthy 1987; Landau 2001; Atkins and Rundell 2008), 
and turn them into good examples; however even such slightly lacking sentences have 
to be found. 

The GDEX tool (Kilgarriff et al. 2008) was designed for this exact purpose of find- 
ing good example candidates in the corpus and offering them to lexicographers ahead 
of poor examples. The criteria used to identify good examples are sentence length, 
keyword position, the presence or absence of rare words, pronouns, web addresses, 
email addresses, etc. GDEX then ranks the examples according to the weighted scores 
obtained with these criteria, and offers the examples with the highest scores first. Since 
characteristics such as typicality or informativeness cannot be used as a criterion in 
the heuristics, one must not expect that all the examples at the top of a GDEX list will 
be good; however, experience from dictionary projects such as the Macmillan English 
Dictionary Online and DANTE (Rundell and Kilgarriff 2011) proves that GDEX can 
save a great deal of lexicographers’ time. 

Initially, only GDEX for English was available, but recently versions of GDEX for 
other languages have been and are being developed. For example, in 2011, GDEX 
for Slovene (Kosem et al. 2011) was developed for the purposes of selecting exam- 
ples for the new lexical database of Slovene. In the next couple of years, we can 
expect versions of GDEX for languages such as Czech, Portuguese, Estonian, and 
so forth. Further developments of GDEX include developing different configura- 
tions for different word classes, or even for specific groups of words within each 
word class (see also Section 6.5). 
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6.4.3 Features that Link a Corpus Tool 
and a Dictionary-writing System 


The process of transferring the data, mainly examples, from the corpus into a database, 
normally located in a dictionary-writing system, can also bea time-consuming task for 
lexicographers. Selecting the text of each example, or saving all the concordance lines, 
and using a copy and paste procedure to transfer it into the database is no longer an 
option for dictionary-publishers, for reasons of time and money. 

One option for speeding up the transfer of corpus data into a dictionary database 
is to integrate a corpus tool and a dictionary-writing system in a single piece of soft- 
ware. Such an attempt is made by the TLex Dictionary Production System (Joffe and 
de Schryver 2004); however, the downside of such systems is that an integrated corpus 
query tool can never be as sophisticated as standalone software solutions, and cannot 
be used on larger (reference) corpora without compromising the functionality of the 
dictionary-writing part of the software. 

The other option is to devise features that facilitate quicker transfer of information 
between a corpus tool and the dictionary-writing system; two such features available 
in the Sketch Engine are presented in this chapter. Using the one-click sentence copy- 
ing feature, lexicographers can select and copy to the Clipboard the entire sentence in 
a concordance line with a single click on the icon to the right of the concordance line, 
and then paste it into the dictionary-writing system. A complementary option is mul- 
tiple line selection where more than one icon, and thus sentence, can be selected at a 
time. An additional step, which is especially useful when selecting multiple sentences, 
is the option to export corpus sentences in the format compatible with the database 
structure; to make this possible, an XML template has to be provided which is struc- 
tured according to the dictionary DTD (Document Type Definition). 

The exporting of examples described in the previous paragraphs is done in the con- 
cordance output, so lexicographers may still need to go through pages of concord- 
ance lines in search of useable examples. Even in word sketches, where information 
is summarized and divided according to grammatical behaviour, lexicographers end 
up examining concordance lines of each collocate, both to find good examples and to 
check the validity of information (see also Section 6.5); all this means a considerable 
amount of clicking, selecting, copying, all of which can be very time-consuming. A 
function that streamlines the example selection process even further is called TickBox 
Lexicography (TBL). TBL provides lexicographers with clickable boxes next to each 
collocate in the word sketch, and after selecting the desired collocates, TBL provides a 
preset number of examples per collocate. Lexicographers can then select the examples 
of each collocate that they want to export, and then all the information (examples and 
collocates) can be exported to Clipboard and imported into a dictionary-writing sys- 
tem. As with one-click sentence copying, it is also possible to use an XML template for 
exporting the data in a format compatible with the dictionary DTD. 
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TBL is especially useful when multiple collocates are selected at a time; however, this 
means that only a certain number of examples per collocate (up to ten) can be listed 
before the output turns into a long page that requires a fair amount of scrolling. Thus, 
by using the GDEX function, which is supported by TBL, a lexicographer’s chances of 
finding a good example among the ones offered can improve significantly. 


6.5 FROM ANALYSING TO VALIDATING 


Computers have become increasingly important for lexicographers, however so far 
they have had a supporting role by presenting data, in raw or summarized form, to 
lexicographers who still have to analyse and/or evaluate the information presented to 
them, and devise a dictionary entry from scratch. In 2011, Rundel] and Kilgarriff envis- 
aged a new form of computer-lexicographer relationship, in which ‘the lexicographer’s 
task changes from selecting and copying data from the software, to validating—in 
the dictionary writing system—the choices made by the computer. Having deleted or 
adjusted anything unwanted, the lexicographer then tidies up and completes the entry’ 
(Rundell and Kilgarriff 2011: 278), In such an approach, a great deal depends on the 
accuracy of the data automatically extracted by a computer program; the lexicographer 
becomes a validator, as opposed to an analyst. 

A form of this approach, described in Kosem et al. (2013), has already been success- 
fully tested on a project developing a new lexical database for Slovene (Gantar and 
Krek 2011),’” on which the present author has collaborated. It involved writing a script 
that extracted collocates of the relevant constructions and grammatical relations of a 
lemma, using a customized sketch grammar (Krek 2012).8 Parameters for the mini- 
mum frequency or salience of a collocate or a grammatical relation could be set. At 
the same time, examples for each of the collocates in the constructions/relations were 
extracted using the GDEX tool (Kosem 2012). 

The experiment focused on less polysemous nouns, verbs, adjectives, and adverbs, 
which were selected with the help of sloW Net (Slovene WordNet) and the Dictionary of 
Standard Slovene. For each word class, a special GDEX configuration was devised that 
took into account the characteristics of the word class, and of the examples of selected 
lemmas. All the information for each entry was extracted in an XML file, which was 
then imported into dictionary-writing software. Then lexicographers had to distribute 
constructions and relations, along with collocates and their examples, into different 
senses, provide definitions, and delete any unnecessary or incorrect content. The initia] 


7 The method of automatic extraction was experimental, and was used towards the end of the 
project, and only fora selection of the entries. 

8 Sketch grammar isa list of definitions of grammatical relations, which are then offered in the 
word sketch, if they are identified in the word’s usage. 


results seem promising, and at the time of writing a detailed evaluation of the proce- 
dure was already underway. 


6.6 CONCLUSION 


Corpus tools have evolved over the years, mainly as a response to the increasing needs 
of lexicographers. In the 1980s, especially during the Cobuild project, the functionality 
of corpus tools was limited to concordance and collocation output, and lexicographers 
were still able to analyse the majority, if not all, of the concordance lines in the corpus. 
In the 1990s, the main change for corpus tools was that they had to be able to handle 
corpora of approximately 100 million words, and support more complex searches; for 
lexicographers, this meant that their job was made more difficult as there were more 
data to analyse. In the twenty-first century, corpus size has increased even more, and 
corpus tools have been improved with functions that have automated some of the tasks 
for lexicographers, sparing them some valuable time and effort, and saving dictionary 
publishers money at the same time. 

More and more lexicographical work is becoming automated, and while some may 
see this as a sign of computers posing a threat to the lexicographer’s role in the dic- 
tionary-making process, it may be better to consider the computer-lexicographer rela- 
tionship as a perfect match. Computer programmes, including corpus tools, contribute 
significantly to the making of a dictionary; however, lexicographers are still needed 
for the not-so-straightforward tasks of word sense disambiguation and definition writ- 
ing. Also, the relationship is reciprocal—computer programmes facilitate the lexicog- 
rapher’s work, but can be improved only on the basis of the lexicographer’s experience. 

The arrival of corpora and corpus tools has also benefited dictionary users, both in 
the quality and the quantity of information on language usage. However, dictionary 
users have changed in the last few decades; new technologies such as the internet and 
mobile technology have made them more demanding—they want instant access to 
up-to-date information, Meeting these demands from users wil] require even tighter 
collaboration between the developers of corpus tools and lexicographers, who will 
need to come up with new ways of presenting information more quickly. One possi- 
ble route, which is especially relevant for lexicographically deprived languages, is to 
offer automatically-generated output on a word’s behaviour to users while the lexicog- 
raphers devise a dictionary entry. This would mean that, at least for a while, the users 
would become active interrogators of the corpus, and the language. 
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7.1 INTRODUCTION 


MONOLINGUAL dictionaries devote more time, money, and effort to the writing and 
editing of definitions than to anything else, but this does not translate proportion- 
ally into commensurate user benefits. Studies of dictionary use (as summarized 
for example in Nesi 2013) have shown that the main uses of monolingual diction- 
aries are for quick and superficial checks on spelling and approximate primary 
meaning, rather than for the more elaborate and carefully constructed linguistic 
information and subtle sense distinctions that are contained in most dictionary 
entries, In languages other than English—for example German, Czech, and modern 
Greek—inflection and grammar can also be important aspects of a monolingual 
dictionary alongside statements of meaning—‘definitions —sometimes, perhaps, 
even more important. 

At the same time, definition writers habitually set themselves unachievable goals 
in terms of accuracy and coverage. This is because, like their readers, they typically 
subscribe to the widespread folk belief that it is possible to define word meanings 
in terms of necessary and sufficient conditions for set membership. According to 
this folk belief, a definition of a word, for example canary, should offer a decision 
procedure for picking out all and only canaries and distinguishing them from 
everything that is not a canary. This folk belief can be traced back at least to the 
great logicians and philosophical thinkers of the European Enlightenment such as 
Wilkins and Leibniz, who knew that there was a problem with it but felt that some 
form of idealization, ignoring minor discrepancies and fuzzy boundaries, was a 
price worth paying for the sake of precision and clarity in the definition of terms. 
Some modern lexical semantic theorists, on the other hand, would argue that this 
apparently simple aim is unachievable because it takes insufficient account of the 
vague, fuzzy, and flexible nature of word meaning in natural language, understates 
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the role of context, and overlooks the prevalence of metaphorical extension of 
meaning. The fact that there are some species of birds that may or may not be 
classed as canaries, while some human singers may be referred to metaphorically 
as canaries, used to be regarded as minor irritations, but are now regarded by some 
as important properties of natural language. A related source of problems is confu- 
sion between the nature of scientific concepts and the nature of meaning in natural 
language. 

Another, equally troublesome folk belief is that each word has a finite, mutually 
exclusive list of senses—a check list—from which the ‘right’ meaning can be plucked 
by some automated decision procedure. In fact, as we shall see, word meaning is much 
subtler, fuzzier, and messier than this folk belief allows. The meaning of a word is very 
often determined to a great extent by its context of use, usually in ways that dictionaries 
say little or nothing about. 

In this chapter, I start by surveying traditional approaches to definition in dictionar- 
ies from the eighteenth century to the present day. I then go on to summarize some of 
the most important discoveries by philosophers and anthropologists about the nature 
of word meaning during the twentieth century, and I close by asking how this is likely 
to affect dictionary writing in the future—should lexicographers be willing to abandon 
the unachievable dream of defining word meaning in terms of necessary and sufficient 
conditions and look instead at the facts of words in use? 


7.2 DEFINITION BY SYNONYM 
AND PARAPHRASE 


In his great Dictionary of the English Language (1755), Samuel Johnson offers three defi- 
nitions for the word definition (with supporting citations): 


Definition 
1. A short description of a thing by its properties. 


I drew my definition of poetical wit from my particular consideration of him; for 
propriety of thought and words are only to be found in him; and, where they are 
proper, they will be delightful. Dryden. 


2. Decision; determination. 
3. [In logick.] The explication of the essence ofa thing by its kind and difference. 


What is man? Not a reasonable animal merely; for that is not an adequate and 
distinguishing definition. Bentley's Serm[ons]. 


(In the citation selected by Johnson from Dryden to support sense 1, ‘him’ refers to the 
classical Latin poet Virgil, although Johnson does not tell us this.) 
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Several questions are raised by this dictionary entry, which is representative not only 
of Johnson’s approach to defining but also of subsequent English dictionaries. Among 
them are the following: 


¢ What principles underlie Johnson’s definitions? 

¢ Do the three definitions define three mutually exclusive sets of entities or con- 
cepts? Are there really three separate senses? 

e What is the relationship between concepts in the mind and entities in the 
world? 

« What is the relationship between concepts in the mind of one language user and 
concepts in the minds ofall or some other users of the same language? 

* Whatis the role of supporting citations? 


Discussions of definition distinguish between the definiendum (the word or phrase 
that is to be defined) and the definiens (the word or phrase that is used to define it). The 
first principle of traditional definition is to construct a phrase or select a word that is 
as nearly as possible synonymous with the definiendum. A principle formulated some 
years before Johnson by the philosopher Gottfried Wilhelm Leibniz (1646-1716) and 
apparently circulating widely in the eighteenth century is that of substitutability while 
maintaining the condition of truth: 


Eadem sunt quorum unum potest substitui alteri salva veritate [Two things are the 
same if one can be substituted for the other without affecting the truth]. 


This slogan can be applied to identity of concepts, as denoted by words and phrases. 
One form of words (the definiens) denotes the same concept as that denoted by another 
(the definiendum) if and only if the truth of any statement choosing either form of 
words is unaffected by the choice. Either will do, and the truth (the meaning) remains 
the same. The slogan is of the greatest importance in considering the nature of defini- 
tion, so I shall return to it later in this chapter. Here, we should notice that in the case 
of Johnson’s definition 1 of the word definition, his attempt is less than successful. Even 
allowing for changes in the conventional meaning of the term wit, it is not an adequate 
paraphrase of Dryden’s remark that he ‘drew his definition of poetical wit from a par- 
ticular consideration of [Virgil]’ to say that Dryden drew his ‘short description’ of poet- 
ical wit ‘by its properties’ from a particular consideration of Virgil. The meaning of 
Dryden’s comment on wit simply falls apart if we attempt substitution of the definien- 
dum by Johnson’s proffered definiens. Dryden’s meaning is more like ‘understanding’ 
or ‘conception’ than ‘short description’. 

Other definitions in Johnson’s Dictionary are more successful in terms of substitut- 
ability. For an example we may consider the first definition of the word defiance higher 
up on the same page of the dictionary: 


Achallenge; an invitation to fight. 
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This definition substitutes well in many contexts, including Johnson’s selected citation 
from Shakespeare’s Romeo and Juliet: 


The firey Tibalt, with his sword prepar’d, 
Which, as he breath’d defiance to my ears, 
He swung about his head. 


If Benvolio had reported that Tybalt breathed ‘a challenge; an invitation to fight’, truth 
would have been preserved. The definition is substitutable in Shakespearean English. 

The point here is not that Johnson did a bad job in defining definition, but rather that 
defining by the principle of synonymic substitutability cannot always be expected to be 
successful, even in the hands of a master lexicographer such as Johnson. Nevertheless, 
synonymic substitutability is generally assumed, even today, to be the target at which 
lexicographers aim or should aim. The meaning of a word can only be defined, if at all, 
by using other words. Outside the futuristic world of multimedia dictionaries, defin- 
ers do not have any medium other than words at their disposal in which to couch their 
definitions. Pointing at an example or a picture of the definiendum (known as osten- 
sive definition) is famously unsatisfactory. Quine (1960) imagined the case of a tribes- 
man who sees a rabbit running by, points at it, and cries out ‘Gavagai!’ The bystanding 
anthropologist is left wondering whether gavagai means ‘rabbit’ or ‘Look at that!’ or 
“There goes today’s dinner’ or any of an unbounded set of other possible meanings. 
And of course ostensive definition cannot even be attempted as a way of approaching 
abstract concepts such as definition and defiance. 

The next point to notice is that Johnson here (and often elsewhere) allows himself 
two bites at the cherry: defiance is not merely ‘a challenge’ but also ‘an invitation to 
fight’. This “two-bites’ strategy, when successful, has at least two functions: first, it 
qualifies the sense in which an ambiguous definiens is being used (i.e. Johnson might 
defend his definition by saying, ‘challenge is being used here in the sense “a summons to 
combat” rather than, say, “a demand of something as due”’); secondly, since there are 
no true synonyms in a language, a second bite can add a slightly different and some- 
times helpful perspective. All too often, however, as Wierzbicka (1992) pointed out, the 
second bite is reduced to the status of a lexicographer’s security blanket—an attempt 
to confirm what has already been said—and may even do more harm than good by 
introducing contradiction or confusion. The lexicographer, having concocted as good 
a substitutable definiens as can reasonably be expected, is tempted to go on to ruin it 
by adding something inappropriate. Changing the metaphor, we may say that accurate 
definition writing in lexicography is as difficult as archery. Not every arrow hits the 
bullseye, and some may even miss the target completely, but firing a second arrow does 
not move the first nearer to the centre. (A good second arrow taken with the first may, 
however, straddle the centre, giving a thoughtful reader a better sense of where this 
‘centre’ might lie.) 

It is hard to know what to make of Johnson’s second sense of definition (‘decision; 
determination’). He offers no supporting citations. On the face of it, it seems to denote 
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a quite different sense from the first sense. Against this, we might reasonably claim that 
concocting a ‘short description’ of poetical wit is much the same as determining or decid- 
ing what counts as poetical wit, which would yield a meaning that is the same as sense 
1. If this is right, then the two ‘senses’ are not mutually exclusive; rather, they are (at best) 
attempts to account for different facets of what is essentially the same phenomenon. 

In the light of the discussion so far, we may tentatively propose the following answers 
to the five questions raised near the beginning of this section: 


¢ Johnson’s definitions are constructed on the principle of synonymic substitutabil- 
ity, and this has been true of almost all subsequent dictionaries. In his Preface, he 
commented on this: “The rigour of interpretative lexicography requires that the 
interpretation, and the word explained, should always be reciprocal.’ 

¢ Johnson’s definitions are not, and are not intended to be, mutually exclusive. This, 
too, is true of almost all subsequent dictionaries. The second definition of a word 
given in a dictionary very often denotes a subset of the first definition, and (insofar 
as a word has a well-established second meaning) this can also be the case with 
second and subsequent senses: a subsequent definiens may denote a subset of a 
preceding definiens. Great dictionaries such as the OED attempt, not always suc- 
cessfully, to make such relationships clear(er) by more or less elaborate systems of 
subnumbering—t, 1.1, 1.14, 1.1b, etc. The assumption that definitions of a polyse- 
mous word ina dictionary are intended to reflect a mutually exclusive set of mean- 
ings is therefore ill-founded. There is often some semantic overlap. For instance, 
in the Oxford Advanced Learner’s Dictionary (ed. 8) sense 1 of pour is defined ‘to 
make a liquid or other substance flow from a container in a continuous stream, 
especially by holding the container at an angle’, while sense 3 is defined ‘to serve 
a drink by letting it flow from a container into a cup or glass’. Closer inspection 
shows that sense 3 covers both transitive and intransitive uses, while sense 1 cov- 
ers only transitive uses, but the partial overlap of the more specific sense with the 
more general one is unmistakable. This kind of regular overlap has caused some 
confusion among naive users, including computational linguists attempting to 
write word-sense disambiguation programs. 

¢ Johnson’s definitions are informal paraphrases of the observed meanings of 
each word in published texts. Language in this classic eighteenth-century lexico- 
gtaphic approach to definition, which is still prevalent today, is seen as a practical 
social artefact for mutual understanding among speakers and hearers or writers 
and readers, rather than as an array of concepts in the mind. Johnson cannot be 
claimed as a precursor of modern psycholinguists. If anything, he—like other 
subsequent lexicographers—tended more towards sociolinguistics: his definitions 
represent by paraphrase the social conventions of word use in English. However, 
above all he was a literary man, not a scientist, and he saw lexicography as a liter- 
ary activity, in which value judgements about matters such as stylistic excellence 
played a more important role than philosophical speculation about matters such 
as the nature of meaning. 
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+ The role of the citations selected from great writers by Johnson for his dictionary 
is therefore, wherever possible, to offer models of stylistic excellence as well as to 
illustrate meaning and use. 


7.3 ARISTOTELIAN-LEIBNIZIAN 
FOUNDATIONS FOR DEFINING CONCEPTS 
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Johnson’s third sense of definition overlaps with the previous two. There does not seem 
to be any very good reason to believe that ‘the explication of the essence of a thing by its 
kind and difference’ is anything more than ‘a short description of [it] by its properties’. 
In this third definiens, Johnson is not describing a separate sense, but rather alluding 
to the use of the term definition in the logic that was being developed in the eighteenth 
century on the basis of the work of Aristotle twenty centuries earlier and Leibniz half 
a century earlier. Aristotle proposed that a class of entities (including both physica} 
objects and abstract concepts, but not single individuals) can be defined by first stat- 
ing the genus term (answering the question, what kind of thing is it?) and then stating 
selected differentiae (answering the question, how is it different from other members 
of the same genus?). This approach has been extraordinarily influential and successful 
in the development of scientific concepts, in particular in natural sciences such as taxo- 
nomic zoology and botany, whose aim is an empirically well-founded classification of 
phenomena in the world around us. It works less well, as we shall see, when applied to 
abstract concepts. 

The Latin terms genus and differentia are inherited from the Christian philoso- 
pher Boethius (c. 480-524), who used them in his Latin translation of the Greek 
‘Introduction’ (Eisagégé) to the logic of Aristotle written by Porphyry of Tyre (c. 
232-303). Porphyry’s Eisagégé was translated into Latin by Boethius, and it is through 
him that we have inherited five of Aristotle’s concepts that are important for defining 
aspects of the world about us: 


Genus: a general term of classification at a fairly broad level. We may propose or 
stipulate, for example, that there is a genus called ‘animal’ and that not only that 
aardvarks, cats, horses, and weasels are members of this genus but also that a par- 
ticular individual (for example, Fred Jones) is a member of this genus. There will 
then be some question about whether reptiles, birds, insects, and fish should be 
regarded as members of this genus. Questions such as this can only be decided by 
stipulation according to the needs and observations of the definer. 

Species: a subdivision of genus. We may stipulate that any genus (plural: gen- 
era) can in principle be subdivided into one or more species. Thus, Fred Jones 
may be classed as a member of the species human, while Jasper the cat may be 
classed as a member of the species cat. In this rather primitive example of a 
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classification, both human and cat can be specified to be members of the class 
(genus) animal. 

Differentiae: these are distinguishing characteristics of individuals that play a part 
in classifying species and genera. For example, Fred Jones is classed as a human ani- 
mal because he walks on two legs and thinks and speaks rationally; these character- 
istics differentiate him from cats and aardvarks. Jasper is classified as a cat animal 
because he has four legs and whiskers, catches mice, and makes a noise convention- 
ally represented in English as ‘miaow’. These properties of Fred and Jasper respec- 
tively contribute to our definition of the class to which they belong. 

Properties: these are characteristics of individuals that may or may not serve as 
differentiae. Insofar as they serve as differentiae, they contribute to the defini- 
tion of species and genera. But there are also accidental properties of individuals 
(see next). 

Accidents: these are properties of individuals that do not contribute to their clas- 
sification: for example, Fred Jones may have red hair, pimples on his face, and speak 
Welsh. But these properties are accidental; they do not contribute to his classifica- 
tion as a member of the species human. However, his property of being a Welsh 
speaker can provoke further questions. Such a property may contribute to the defi- 
nition of a class of Welsh people; on the other hand, Fred Jones may not be a Welsh 
person at all; he may be a Londoner who has learned the language for fun. 


Definition by genus and differentiae is particularly useful in defining certain kinds of 
scientific concept, for example regular mathematical and geometrical concepts such 
as square and triangle, but has been extended (with more or less success) to a very wide 
range of other concepts. It is important to note that Aristotle and his followers were 
concerned with developing criteria for recognizing classes of entities, not with the 
description of individuals. Aristotle did concern himself, briefly, with the description 
of individuals: an individual has many accidental properties that have nothing to do 
with classification. For example, hair colour is an accidental property of an individual 
human being. Only if we propose a class such as blonde or brunette does hair colour 
become a differentia selecting members of a particular class within the overall genus of 
human beings (more strictly, in this case, female human beings). 

It should be noted at this point that definition by classification into hierarchies 
based on distinguishing species and genera tends to break down and become inoper- 
able when applied to abstract entities such as thought, concept, notion, idea. Is a con- 
cept a species of idea, or is an idea a species of concept? Or are they both co-hyponyms 
of some other genus, say mental process? But then someone might object that concept 
denotes something static, while process implies something that changes over time. 
Further problems arise if we ask whether a proposition is a concept. Proposition is tech- 
nically defined among logicians as a statement that is either true or false, but when 
we hear business people talking about a business proposition, the criterion ‘true/false’ 
(i.e. the differentia ‘has a truth value’) seems to have been replaced by something like 
‘potentially profitable/potentially loss-making’. This is acceptable, of course, because a 
term may have two different meanings in different domains; the point being made here 
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is only that a hierarchy of genus and species does not work very well as a way of defining 
abstracts; it probably needs to be replaced by something else. 


7.4 DEFINITION AND STIPULATION 
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In several places in his writing, the philosopher Gottfried Wilhelm Leibniz (1646-1716) 
showed an interest in the problems of defining words and concepts. He referred to an 
idea that he called characteristica universalis, commonly translated as the universal 
characteristic or, more accurately perhaps, as a universal language of characterization. 
In French, he sometimes referred to it as la spécieuse générale. Like John Wilkins (1668) 
a few yearsearlier, Leibniz imagined a language of precisely defined concepts that would 
be used to express mathematical, metaphysical, and scientific truths. In about 1678 he 
wrote a Latin fragment under the heading (in English translation) A General Language, 
which was the start of an attempt, never completed, to compile an encyclopaedia of uni- 
versal knowledge. He returned to this project at various times later in his life. In about 
1702 he began an onomasiological work, likewise unfinished, for which he wrote about 
fifty pages of terms and definitions, known as the ‘table of definitions’ (see Couturat 
1903). A brief extract (in English translation) will give the flavour of the approach: 


A body isan extended resisting thing. 

A spirit is an incorporea] thinking substance. 

A thinking thing is that which is conscious ofits actions, or has a reflexive act. 
Manisathinking animal, or is a thinking thing endowed with an organic body. 
An organism isa perfect natural machine, or one of which any part isa machine. 


—Extracts translated from Latin by Emily Rutherford in Hanks (2008) 


It seems clear from this that Leibniz is proceeding from the top down, that is, from the 
general to the particular, and that he is concerned with concepts in an ordered universe 
rather than words ina language. He has no interest in the vagaries of everyday language 
use. In fact, he does not care whether words actually exist to denote the concepts that 
he needs in his system: lexical gaps are easily plugged with phrases such as ‘a thinking 
thing’ and (elsewhere) ‘an existing thing’, ‘a mathematical concrete thing’, and so forth. 

The next extract shows more clearly Leibniz attempting to construct a succinct, con- 
trastive statement of necessary and sufficient conditions for each of his concepts. This 
extract is from a section that defines terms denoting different kinds of motion: 


To fly is to move oneself in the air by rowing without a solid support. 

To swim is to do the same in water. 

To crawl is to move oneself forward on dry land without feet. 

To walk is to move oneself forward by foot. 

To go is to be in motion toward a location. To come is to go where one is expected. 


102 PATRICK HANKS 


An animal leaps when it elevates itself from a support, to which it will immediately 
return. 

To flee is to withdraw because of fear. To follow is to come near to one who is fleeing. 
He leads who makes [others] advance with him. 


These fragments must be regarded as no more than jottings towards a larger, more con- 
sistently organized work, which was never fulfilled. Nevertheless, the general approach 
is clear. Related terms are defined in such a way that each contrasts with the others. 
Definitions start with a genus term (e.g. ‘to move’), followed by differentiae (“by rowing 
without a solid support’). 

Behind these definitions lurks the dictum about identity already mentioned above: 
Eadem sunt quorum unum potest substitui alteri salva veritate (‘Two things are the 
same if one can be substituted for the other without affecting the truth’). This ‘substi- 
tutability’ approach to word-sense definition is still widely accepted as the standard 
model in almost all modern English dictionaries. Only Cobuild adopts a systematically 
different approach, placing the emphasis instead on phraseology and pragmatics—the 
contexts in which words are used. 

Thinkers from Leibniz to Frege and Russell were concerned by the apparent paradox 
of synonymy. Assuming that a language is (or can be made to be) an orderly collection 
of concepts, then if a term denoting a concept is defined accurately by another term (a 
synonym or paraphrase), the two terms have identical meaning, so nothing has been 
explained by stating a synonym. On the other hand, if the meaning of the synonym is 
different, then the definition is incorrect. Frege’s example (1892) was: 


(1) Hesperos (the evening star) is Phosphoros (the morning star). 


Hesperos is an ancient Greek name for a heavenly body seen shining brightly in the 
evening after sunset; Phosphoros is an ancient Greek name for a heavenly body seen 
shining brightly in the morning before dawn. At one time, they were thought to be 
two different objects, but eventually the ancient Greeks came to accept the Babylonian 
view that they are two different names for one identical object—which we now know as 
the planet Venus. This example shows that identity can be discovered. Frege used it to 
demonstrate that the sense of a term (Sinn) is different from its reference (Bedeutung), 
and that sense (Sinn) is a purely linguistic matter, while reference (Bedeutung) relates to 
something in the world outside language. 
A rather different example is offered by Wiggins (2007, 2010): 


(2) Furze is gorse. 


The two terms have identical reference, but of course number 2 is not necessarily a tau- 
tology. It is, in fact, perfectly explanatory for anyone who does not know the meaning 
of the term furze, provided that they do know the meaning of the term gorse. So here, 
the paradox can be resolved (or sidestepped) by moving on from conceptual identity to 
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linguistic reality. Why, in such cases, does English have two terms with identical refer- 
ence? Are there two subgroups of English speakers, one using furze, the other using 
gorse? Then 2 would be just as explanatory as is 3 for someone who does not know the 
French word or 4 for someone who does not know the German word. 


(3) ajonc is gorse. 


(4) Ginster is gorse. 


Or perhaps there is some stylistic or register difference between furze and gorse? These 
are matters for empirical investigation. Fallacies to be avoided are the assumption that 
meanings in natural language consist of homogeneous Leibnizian concepts or that a 
community of language users is a homogeneous whole; also that each word in a lan- 
guage has a unique set of references. There is often considerable overlap. 

Leibniz was a powerful influence on subsequent philosophers and logicians, 
including Frege, Russell, and the early Wittgenstein. We shall not attempt here to 
trace and evaluate this rich thread of development in Western semantic theory, for 
it would be a digression—indeed, a distraction—from our goal, which is to explore 
approaches to the definition of words in natural language. Frege and Russell, for 
example, were very great thinkers, and Russell in particular made use of some natu- 
ral language examples to illustrate his theories, but neither of them undertook any 
serious empirical investigation of how language is actually used by ordinary peo- 
ple to make meanings. In other words, they were concerned with the definition of 
concepts and logical relations, and did not distinguish between this and meaning in 
ordinary language. 

Because his influence as a philosopher and logician was so great, Leibniz’s approach 
to defining the meaning of concepts was assumed to also be applicable to studying 
the everyday meanings of words, and was not seriously challenged until Wittgenstein 
(1953), for which see Section 7.9 below. The empirical study of natural language on a 
synchronic basis had to wait until Saussure, and the detailed empirical study of word 
meaning as an aspect of natural language had to wait even longer, until the advent of 
corpus linguists such as John Sinclair. In truth, it has hardly begun. 


7.5 THE GENERATIVE LEXICON 


POPE PE Pe TOT ee Te TOr re reveererrereveerrvecrerrrvireredrrverric trv trretrreretireiriritrititirerrtet tre ttet rt ririrerrt itt rrr rrr 


Quite recently (1995) the computational linguist James Pustejovsky, following the phi- 
losopher J. M. Moravesik (1975), revisited Aristotle and proposed a basic apparatus of 
four elements (called qualia; singular quale) for the analysis of concepts and proposi- 
tions. The four qualia are: 


+ The formal (roughly equivalent to genus term): what sort of thing is it? 
+ The constitutive: what are its components? 
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e The agentive: how did it originate? 
e The telic: what is its purpose? 


Not every concept has all four qualia. For example, only artefacts—manufactured 
objects—and (arguably) a small number of other classes of entities such as domesti- 
cated animals have a telic. (It makes no sense to ask, what is the purpose of a mountain 
or a mouse.) And on the other hand, Pustejovsky has shown that many classes have 
more than one formal: for example, the category book denotes both a set of physical 
objects (you can prop a door open with a book) and a set of information sources (you can 
read a book). 

Pustejovsky argues that a ‘sense-enumerative lexicon’ is an impossibility, because of 
the dynamic, generative nature of the lexicon: it is not possible to define all the mean- 
ings of a word, because words are used creatively and have different facets of meaning 
in different contexts; to some extent these different facets are activated or foregrounded 
by different qualia. Thus the qualia go some way towards accounting for lexical ambi- 
guity and for lexical creativity. Moreover, empirical analysis of vast quantities of text is 
not sufficient to guarantee that a sense-enumerative lexicon would be exhaustive, for 
some words may have potential meanings that have never actually been used. 

Pustejovsky proposes that accounting for a meaningful utterance needs at least four 
levels of representation: 


Argument structure, specifying the number and type of logical arguments (some- 
times called ‘players’), For example, a verb of movement will have two or three 
arguments or players: the thing that moves, the direction in which it moves, and 
(if it does not move under its own steam) an agent that causes it to move. Direction 
may be further subdivided into point of origin, path, and goal. 

Event structure: what kind of event is being spoken about? For example, is it an inter- 
action between people, is it a process in the natural world, or is it a state of affairs?! 
Qualia structure: formal, constitutive, telic, or agentive, as discussed above. 
Lexical inheritance structure: generative lexicon theory depends on a system of 
semantic types representing the ‘formal’ quale of every meaning of every word. It 
turns out to be an empirically verifiable fact that these formals are organized hier- 
archically; their properties can be inherited. So, for example, a spade is a TOOL. 
That is the formal of the word or concept spade. TOOL inherits the properties of its 
superordinate semantic types: ARTEFACT and, above that, PHYSICAL OBJECT. 
Thus, if we learn that an unfamiliar word (let us imagine, lopat) denotes a kind of 
spade (no doubt with differentiae that distinguish it from spade, shovel, entrenching 
tool, trowel, etc.), we can base our future discourse about lopats on the fairly reliable 
assumption that a lopat is a manufactured object (an artefact) and that it is a physi- 
cal object (you can bump into it). 


' The inclusion of states among event structures leads to the rather surprising conclusion in 
generative lexicon theory that states are classed as events. The term eventuality is sometimes used to 
encompass both states and events as commonly understood. 
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The theory of semantic types and inherited properties is central to the generative lexi- 
con. These two notions are also central considerations in definition writing, as indeed 
are the other levels of representation. All too often trainee lexicographers spend time 
agonizing over one or two qualia—for example, the agentive of a word like fable: are 
tables made of wood, metal, glass, or what? Do they necessarily have legs and if so how 
many?—while overlooking others, for example the telic: tables are for putting things on. 

Anyone setting out to compose or process definitions of concepts cannot do bet- 
ter than to start with Pustejovsky’s four qualia and his theory of semantic types. At 
the same time, it would be a mistake to assume that this kind of conceptual analysis 
is straightforwardly applicable to the meaning of words in ordinary language. This 
is a mistake made by most linguists and almost all dictionaries, so it is unlikely to 
be eradicated in the near future. It is a great starting point, but not a good endpoint. 
I shall argue in Section 6.10 below, following Sinclair, that some element of empirical 
analysis of usage is needed, to ensure that our definitions are well focused on the nor- 
mal usage of words and are not distracted by attempts to cover all possible meanings, 
however remote. 


7.6 DEFINING IN TERMS OF A ‘NATURAL 
SEMANTIC METALANGUAGE 


Let us return once more to Samuel Johnson, the founder of serious English lexicogra- 
phy. Definition writing was the aspect ofhis work with which Johnson declared himself 
to be least satisfied. His primary stated aim was ‘to collect the words of our language’. 
His second aim was etymological and historical: discovering the origins and earliest 
meanings of English words. Definition writing came third. On definitions and expla- 
nations, he commented: 


To interpret a language by itselfis very difficult; many words cannot be explained by 
synonimes, because the idea signified by them has not more than one appellation; 
nor by paraphrase, because simple ideas cannot be described. When the nature of 
things is unknown, or the notion unsettled and indefinite, and various in various 
minds, the words by which such notions are conveyed, or such things denoted, will 
be ambiguous and perplexed. ... Things may be not only too little, but [also] too 
much known, to be happily illustrated. To explain, requires the use of terms less 
abstruse than that which is to be explained, and such terms cannot always be found; 
for as nothing can be proved but by supposing something intuitively known, and 
evident without proof, so nothing can be defined but by the use of words too plain 
to admit a definition. (Dictionary of the English Language, 1755: Preface: v) 


This anticipates very closely the motivation of the ‘Natural Semantic Metalanguage’ 
(NSM) of Goddard and Wierzbicka (2002), developed by Wierzbicka (1985, 1987) and 
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tested by Goddard (2008) and others on a wide variety of unrelated languages. NSM pro- 
poses that the meaning of all terms in all languages can be reduced to and expressed in 
terms ofjust a handful of semantic primitives or ‘primes’, which are so basic that they can- 
not themselves be defined. This universal core is claimed to have a fully ‘language-like’ 
character in the sense that it consists of a lexicon of semantic primitives together with a 
syntax governing how the primitives can be combined (Goddard 1998a). The idea itself 
is far from new. In one form or another, it has been held by philosophers from the seven- 
teenthcentury onward.’ It was given elegant expression by Arnauld and Nicole: 


It would be impossible to define every word. For in order to define a word it is 
necessary to use other words designating the idea we want to connect to the idea 
being defined. And if we wished to define the words used to explain that word, 
we would need still others and so on to infinity. Consequently, we necessarily 
have to stop at primitive terms which are undefined. (Arnauld and Nicole 1662; 
tr. Burker 1996) 


Unlike previous proponents of the notion of semantic primes, Wierzbicka and 
Goddard have devoted considerable efforts to empirical investigation, compiling an 
inventory of the semantic primes that (they claim) are common to all languages. There 
are currently sixty-four of them (Table 7.1). 

What is a semantic prime, and why should anyone believe that such things exist? 
A recurrent theme in lexical analysis is that words with complex meanings are explain- 
able in terms of simpler words. For example, the primary senses of run, creep, crawl, 
scuttle, amble, ride, drive, climb, fall, and several other words can be explained as 
semantically differentiated variants of the basic notion ‘move’ or ‘go’. They differ as to 
manner of motion, direction, or other semantic feature(s), In NSM, moveisa basic term 
and therefore indefinable, suitable for use as the genus word in a definition, whereas 
a word such as motility or perambulation would not be suitable for the same purpose. 
Each semantic prime is supposed to have one and only one sense. Polysemy and fuzzi- 
ness are not properties of NSM primitives. 

For the purposes of practical everyday lexicography, it is of course perfectly possible 
to construct a definition of move and indeed of any other ‘semantic prime’, but only 
at the cost of circularity in definition. Thus, for example, the first sense of move in the 
Concise Oxford English Dictionary (COED12) is: 


go or cause to goina specified direction or manner. 
The first sense of go is: 


move from one place to another; travel. 


2 For an evaluation of this approach in the context of structuralist approaches to lexical semantics 
see also Geeraerts (this volume). 
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Table 7.1 Proposed set of semantic primes 


Substantives |, YOU, SOMEONE, PEOPLE, SOMETHING/THING, BODY 

Determiners THiS, THE SAME, OTHER 

Quantifiers ONE, TWO, SOME, ALL, MANY/MUCH 

Evaluators GOOD, BAD 

Descriptors BIG, SMALL 

Intensifier VERY 

Mental predicates THINK, KNOW, WANT, FEEL, SEE, HEAR 

Speech SAY, WORDS, TRUE 

Actions, events DO, HAPPEN, MOVE, TOUCH 

Existence and possession THERE IS / EXIST, HAVE 

Life and death LIVE, DIE 

Time WHEN/TIME, NOW, BEFORE, AFTER, ALONG TIME, A SHORTTIME, 
FOR SOME TIME, MOMENT 

Space WHERE/PLACE, HERE, ABOVE, BELOW; FAR, NEAR; SIDE, INSIDE; 


‘Logical’ concepts 


TOUCHING 
NOT, MAYBE, CAN, BECAUSE, IF 


Augmentors VERY, MORE 
Taxonomy, partonomy KIND OF, PART OF 
Similarity LIKE 


ns re eae! 


Source: From Wierzbicka and Goddard (2002) 


Logically, we may complain that in this pair of definitions move is defined as ‘go’ and 
go is defined as ‘move’. However, while this is undoubtedly true, it misses the point. 
Dictionaries are practical tools; no one, least of all a logician, uses a dictionary to 
find out the basic meaning of a word. What COEDi2 is telling us is that in the first 
place, move has a causative sense (‘move the furniture around’; ‘move the piano into 
the dining room’) as well as an inchoative (“The fish’s progress through the water 
is rather slow, but it does move’), and that in the second place it is very often com- 
plemented by an adverbial of manner or direction. Arguably, go might have been a 
better choice for the name of this prime than move, but that is a matter of no impor- 
tance, because we are dealing here with conceptual meaning not word meaning. The 
name of the prime simply does not matter. We should also note that COEDi2 offers 
eleven more senses of the verb move and ten more senses of the verb go, plus a very 
large number of idiomatic phrases and phrasal verbs, as well as some noun senses 
for both words. Recording and differentiating these are among the main concerns of 
practical lexicography. Lexicographers aim, in Wierzbicka’s wonderful phrase, to be 
‘precise about vagueness: 


An adequate definition of a vague concept must aim not at precision but at vague- 
ness: it must aim at precisely that level of vagueness which characterizes the con- 
cept itself. (Wierzbicka 1985: 12) 
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7.7 TAXONOMIC HIERARCHIES 


The greatest adventures in the history of definitions resulted in the creation of enor- 
mous taxonomic hierarchies. Two approaches may be singled out for discussion: (1) the 
general taxonomic hierarchies of thinkers such as Wilkins, Roget, and Miller and 
Fellbaum, and (2) the botanical and zoological classifications of Carl Linnaeus. 

John Wilkins’s Essay towards a Real Character and a Philosophical Language 
(1668) is a work of extraordinarily ambitious scope. His main aim, which, alas, led 
to nothing, was to develop an entirely new conceptual writing system, analogous to 
that of Chinese, which could be realized phonologically in any spoken language. He 
was the first in a long line of scientific thinkers (others have included Leibniz, Frege, 
and Russell) who considered natural languages such as English to be inadequate 
for the expression of scientific concepts and who yearned for a more precise vehicle 
for scientific thought. Wilkins objected in particular to the fact that the meanings 
of words are vague and that they tend to change unpredictably. He also objected to 
alphabetic writing systems, which represent speech sounds rather than the mean- 
ings of concepts. He was much taken with the idea of a writing system that would 
represent concepts independently of speech sounds, a property that he attributed to 
Chinese ideographs. 

As a preliminary to developing his ‘universal character’, Wilkins set out to collect 
all known words and concepts and arrange them in a hierarchical order, ‘enumerat- 
ing and describing all such things and notions as fal] under discourse’. The result is 
nothing less than a vast ontology, a hierarchically arranged onomasiological network 
of English words, organized roughly according to genus terms and differentiae. The 
genus terms and differentiae are sporadic, not systematic, and are based on physical 
appearance and behaviour, for Wilkins lived before anyone had thought seriously 
about scientific taxonomical zoology and botany. This part of his Essay is a precur- 
sor of Peter Mark Roget’s famous Thesaurus (1858), as Roget himself acknowledged, 
and of the Princeton WordNet developed by Miller and Fellbaum in the 1980s and 
1990s, An example will illustrate Wilkins’s method and his organization of con- 
cepts, Under the general heading ‘Of Beasts’, he lists various kinds of beasts and 
eventually comes to ‘rapacious beasts of the cat-kind’, where he includes not only 
lion, tyger, leopard, lynx, and domestic cat, but also bear, ferret, polecat, stoat, weesle, 
beaver, and otter. Beside lion stands a one-word behavioural differentia, ‘roar’, and 
likewise beside cat, kitten, kitling stands mew. His lists are punctuated with brief 
observations (often only one word) about the physical appearance and behavioural 
differentiae of these various animals. The domestic cat, for example, is ‘an enemy to 
mice’. The beaver and otter are ‘amphibious’. 

Wilkins’s work makes fascinating reading (or rather, browsing), if only for the picture 
that it gives of a world view before systematic Linnaean classification of plants and ani- 
mals had been developed. It is worth bearing in mind that terms such as mammal were 
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eighteenth-century inventions; that is to say, Wilkins lived in a world which did not see 
the classificatory relationship between horses and dogs that we assume today is com- 
monplace, and would be even harder put to it than we are to decide whether a tortoise 
should be classed as a reptile. The painstaking empirical analysis of flora and fauna that 
provided the easy classifications of today’s world were not carried out until the eight- 
eenth and nineteenth centuries, long after Wilkins was dead. 

A direct descendant of Wilkins’s Essay is Roget’s Thesaurus, which has appeared 
in many editions throughout the English-speaking world since its first appearance 
in 1858 (see further Kay and Alexander, this volume). This splendid practical work 
has established itself primarily as a synonym finder, but in truth it is much more 
than that: it isa complete recension of Wilkins’s work in the light of nineteenth-cen- 
tury science, informed by massive common sense. However, Roget’s Thesaurus does 
not actually contain definitions (though many a lexicographer has had reason to be 
grateful to it when struggling to find the right word in a definition), so we will say no 
more about it here. 

A third onomasiological work must be mentioned before we move on: WordNet. 
WordNet is a comprehensive network of the semantic relations among English words. 
It makes no attempt to distinguish between word meaning in natural language and 
scientific concepts. Instead, the fine distinctions of modern science are faithfully 
observed. $o the taxonomic hierarchy for canary, expressed as an IS-A hierarchy, 
looks like this: 


canary IS A finch IS AN oscine IS A passerine IS A bird IS A vertebrate IS A chordate 
IS AN animal IS AN organism IS A living thing IS AN object IS A physical entity IS 
AN entity. 


At each step, synonyms are given if there are any (thus, as a synonym of vertebrate we 
find craniate), along with a definition. The definition of vertebrate or craniate is ‘(ani- 
mals having a bony or cartilaginous skeleton with a segmented spinal column and a 
large brain enclosed in a skull or cranium). These definitions are either perfunc- 
tory (e.g. ‘canary: any of several small Old Word finches’) or taken from some source 
deemed to be authoritative. Thus, there islittle to be learned from WordNet to aid us in 
the pursuit of excellence in writing definitions of natural-language terms. Moreover, 
the hierarchical principle is carried out systematically. 

WordNet is the favoured lexical resource among computational linguists, partly no 
doubt because it is free, but also because it is comprehensive, if not more than com- 
prehensive. It contains not only words that are in everyday use but also extremely rare 
words such as saltate (‘leap or skip, often in dancing’).° 


3 Compare also discussion of WordNet and its heritage among structuralist approaches fo lexical 
semantics in Geeraerts (this volume). 
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7.8 LINNAEUS: THE RETREAT TO NEW LATIN 


The great Swedish life scientist Carl Linnaeus (1707-78), otherwise known as Carl von 
Linné, devised a hierarchical system for the classification of plants and animals—a 
binomial system, which means that the name of each class has two components: a 
genus and a species. In this system a canary is Serinus canaria. 

Part of Linnaeus’s genius was his recognition that natural languages do not provide 
adequate means for the scientific classification of plants and animals. It was therefore 
necessary to invent terms in an artificial language (New Latin) for use in naming spe- 
cies and genera, each of which could have an extensive description in a vernacular 
language. In Linnaeus’s taxonomical system (1735), the plant and animal kingdoms 
are divided into classes, which in turn are divided into orders, families, genera (singu- 
lar: genus), and species. In its present-day manifestation, this hierarchical nomenclature 
stands as an idealized representation of the evolution of species, in a Darwinian sense. 
The word ‘idealized’ is important here; it could even be equated with ‘approximate’. 

Common sense suggests that the definition of a term denoting an animal or bird, 
for example canary—a small yellow songbird—must be a simple matter. A thing either 
is a canary or it is not. Precision, it would seem, can be achieved by appealing to the 
Linnaean system: a canary is a member of the species Serinus canaria (Serinus is the 
genus name; canaria the species), family Fringillidae (finches), class Aves (birds). 

But it turns out that common sense is wrong. The distinction between canary and 
non-canary is more complicated than the non-ornithologist might imagine. There are 
not only several species of canaries within the genus Serinus (e.g. black-faced canary, 
forest canary), some of which are not yellow or do not have a sweet song, but also there 
are boundary cases and disputed cases. From time to time, after painstaking dissec- 
tion of specimens, taxonomic zoologists propose moving a species out of one genus or 
order into another. For example, some species formerly classed as outlying members 
of the family of siskins (Carduelis) are now regarded as canaries (genus Serinus), while 
some other species that were formerly regarded as canaries have moved the other way: 
they are now classed as siskins or other kinds of finches. Thus, the citril finch and the 
Corsican finch were formerly classed as canaries (genus Serinus) but are now placed 
in the genus Carduelis as Carduelis citrinella and Carduelis corsicana respectively. The 
higher up the classificatory tree we go, the more complexities of classification we find. 

Boundary cases such as these in the matter of definition do not affect the overarching 
fact that there are many individuals that are indisputably canaries: they are small birds, 
they are yellow, and they sing. Zoologists have taken to using terms like ‘true finches’ 
and ‘truecanaries’ for thecentral and typical species in a genus or order, although more 
appropriate adjectives in the light of modern semantic theory might be ‘prototypical’ 
or ‘stereotypical’, 

It is, of course, of little or no benefit to ordinary dictionary users to offer them a New 
Latin term in place of a description or typification. This was part of the motivation 
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for the editors of the New Oxford Dictionary of English (1998: second and subsequent 
editions published as the Oxford Dictionary of English (ODE)) to devise a new style of 
explanation for natural-kind terms. It is instructive to contrast OED with ODE: 


OED: finch: A name given to many small birds of the order Passeres, esp. to those of the 
genus Fringilla or family Fringillide. tto pull a finch: to swindle an ignorant or unsus- 
pecting person (cf. to pluck a pigeon). 


ODE: finch: a seed-eating songbird that typically has a stout bill and colourful plum- 
age. The true finches belong to the family Fringillidae (the finch family), which includes 
chaffinches, canaries, linnets, crossbills, etc. Many other finches belong to the bunting, 
waxbill, or sparrow families. 


For the definition of this word, OED relies heavily on three New Latin terms, which 
will mean little to most ordinary dictionary users. About the only information about 
the bird that such a user is likely to take away is the fact that finch is a name for many 
small birds. ODE, by contrast, offers four additional descriptive facts about typical 
finches (seed-eating; song; stout bill; colourful plumage) before going on to the new 
Latin classification. 

If we turn now to spider in OED, we finda much more informative entry. Once again, 
the entry starts with a New Latin classification (including the defining word arachnid, 
which will not be of much use to many users), but it then goes on to offer a truly inform- 
ative description. 


OED: spider: One or other of the arachnids belonging to the insectivorous order 
Araneidz, many species of which possess the power of spinning webs in which their 
prey is caught. ... The cunning, skill, and industry of the spider, as well as its power of 
secreting or emitting poison, are frequently alluded to in literature. The various spe- 
cies or groups of spiders are frequently denoted by a distinctive premodifier, as bird- 
catching, crab-, cross-, diadem-, garden-, house-, jumping-, mason-, sedentary, spinning, 
trap-door-, wall-, wandering spider, etc.: see these words. 


This time, it is ODE that seems to be troubled by scientistic coyness, although much of 
the useful descriptive information gets in eventually (in a second sentence): 


ODE: spider: an eight-legged predatory arachnid with an unsegmented body consist- 
ing of a fused head and thorax and a rounded abdomen. Spiders have fangs which inject 
poison into their prey, and most kinds spin webs in which to capture insects, Order 
Araneae, class Arachnida. 


As an example of a well-formed modern Aristotelian definition of a concept, we may 
cite the opening sentence of the Wikipedia article on birds (in which underlining indi- 
cates hyperlinked terms): 
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Birds (class Aves) are feathered, winged, bipedal, endothermic (warm-blooded), 
egg-laying, vertebrate animals. 


This appears to be a statement of necessary conditions, even though in fact some spe- 
cies of birds (e.g. kiwis) do not have feathers, while others (e.g. penguins) have only 
vestigial wings. Following Putnam, we may observe that a one-legged bird is still a bird, 
so the condition ‘bipedal’ is also optional. Such points are now widely dealt with by the 
use of hedges such as ‘predominantly’. Finches are ‘predominantly’ seed-eating. 

The point of all this is that Linnaean New Latin terminology provides a terminologi- 
cal system for the manipulation of concepts, without troubling the folk terminology of 
plants and animals. The citril finch is still a citril finch in ordinary language, regard- 
less of the fact that experts have moved it from one genus to another. The meaning and 
extension of New Latin terms can be discussed in ordinary natural languages without 
risk of terminological confusion. 

No such benefit extends to other areas of terminology. Consider, for example, the 
definition of the word second (a unit of time). In ordinary language, second means 
nothing more than a very short period of time, and attempts to define it more precisely 
are misguided and misleading. However, for certain kinds of scientific research, preci- 
sion is essential. Second is the base unit of time in the international system of weights 
and measures (SI unit), and has been defined by an international committee that meets 
periodically to considers such things: 


the duration of 9,192,631,770 periods of the radiation corresponding to the transition 
between the two hyperfine levels of the ground state of the caesium 133 atom. 


Certain other SI units (e.g. kelvin, ampere, and mole) are less complicated because they 
do not have an ordinary-language meaning: they exist only in the specialist terminol- 
ogy of scientists and technologists. Here the lexicographer can gladly follow the scien- 
tist without fear of conflicting realities. 


7.9 FAMILY RESEMBLANCES 
AND PROTOTYPE THEORY 


In order to be able to move on, it is important to notice the difference between using 
words to classify objects in the world (both abstract and physical), on the one hand, and 
defining the meaning of words on the other. Dictionary definitions aim to define the 
- meaning of words. The task of the writer of definitions in a dictionary is to summarize 
(through paraphrases or otherwise) the conventional meaning that a word has in a lan- 
guage, not to classify objects in the world. It turns out that, while it is quite possible to 
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create stipulative definitions of sets of objects in the world and to invent terms such as 
mammal to denote such sets, analysis of word meaning in ordinary language requires 
a different approach. The main difference is that, unlike scientific concepts, the mean- 
ings of ordinary words in natural language cannot be defined by stipulating necessary 
and sufficient conditions for set membership. Set membership in natural language is 
fuzzy; moreover, words can have several meanings, which may be (or seem to be) com- 
pletely independent of one another or may be linked to one another by chains of ‘family 
resemblances’. These facts about word meaning were gradually established during the 
course of the twentieth century by philosophers of language and anthropologists, prin- 
cipally Ogden and Richards (1923), Wittgenstein (1953), Putnam (1971), and Rosch (1972; 
published under the surname Heider). 

Ogden and Richards insisted on the importance of distinguishing the meaning of 
a term from the objects in the world that it might denote. A tree may consist of wood, 
but the meaning of tree does not: it consists of some sort of representation in the minds 
of language users, which they must be able to map onto similar insights in the minds 
of other users of the same language if effective communication is to take place. This 
apparently trivial insight reminds us that dictionary definitions take a number of short 
cuts that may be practical necessities but are fraught with the potential for semantic 
error. For example, the word canary is not a bird; it is, strictly speaking, a term denot- 
ing a type of bird, or possibly several different types of bird. The relevance of this point 
will, I trust, become clearer in Section 7.10 below. 

Wittgenstein (1953) famously argued that the meaning of a term should be seen as 
a ‘chain of family resemblances’. He used as an example the word game, arguing that 
it is acommon error to assume that there must necessarily be some property that all 
games share (‘otherwise they would not be called games’). Instead, if we look and see 
what kinds of game there are, we shall see that there are several different properties 
that games may have; each game or subset of games may have different combinations 
of these properties. Thus, in some games there may be physical activity, but there are 
other games (chess, for example) in which there is virtually none; in some games there 
is competition between two players or teams, but other games (patience, for example) 
are played by solitary individuals; and so on. It has been argued that all games are nec- 
essarily leisure activities or pastimes, but a glance at what goes on in a game of football 
between two professional teams should be sufficient to convince us that it can hardly 
be counted as ‘a leisure activity or pastime’. Wittgenstein argued that if we take the 
trouble to look and see what is going on (rather than making assumptions), ‘we see a 
complicated network of similarities overlapping and criss-crossing: sometimes overall 
similarities.’ He added: 


I can think of no better expression to characterize these similarities than ‘family 
resemblances’; for the various resemblances between members of a family: build, 
features, colour of eyes, gait, temperament, etc. etc. overlap and criss-cross in the 
same way.—And I shall say: ‘games’ form a family. (Wittgenstein 1953: $67) 
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It was left to Putnam (1970) to point out that a three-legged tiger is still a tiger and that 
this invalidates the belief that word meaning can be defined by stating necessary and 
sufficient conditions, if (as seems reasonable) tiger is defined as ‘a four-legged animal’. 
Indeed, a satisfactory definition of tiger can and should state several other differentiae, 
too: terms such as ‘big cat’, ‘has four legs’, ‘wild’, ‘solitary’, “habitat in Asia’, and ‘has 
dark vertical stripes on reddish-orange fur’. If we were to encounter a tame sociable 
three-legged dwarf albino (colourless) tiger in a European zoo, it would still be a tiger, 
literally so called; this bizarre encounter would not be a reason for altering the termi- 
nology used to define tiger. Against this, it is hard to imagine conditions under which 
the word tiger might be used to denote an animal that is not a member of the genus of 
cats (in modern technical terminology, the family Felidae). 

Putnam wasa philosopher of science and mathematics; the theory of word meaning 
was a sideline for him. But this aspect of his work is of the greatest importance for defi- 
nition writing. The bottom line of it is that the definition writer must define stereotypi- 
cal tigers, not all possible tigers. Putnam’s work was to be paralleled by a contemporary 
of his, the anthropologist and cognitive scientist Eleanor Rosch, who used the term 
‘prototypes’ rather than ‘stereotypes’. This term is the one that has caught on. Modern 
definition writers seek to identify prototypical concepts denoted by words. 

George Lakoff (1973) drew the attention of lexicographers to Eleanor Rosch’s emer- 
gent prototype theory, giving as an example the meaning of the word bird. He pointed 
out that in ordinary language use some birds, for example sparrows and hawks, are 
more ‘birdy’ than others, for example chickens and penguins. (See also Geeraerts, this 
volume.) 

Geeraerts (2010) offers an insightful account of Rosch’s prototype theory, discuss- 
ing the meaning of the word fruit. Probably every English speaker knows what a fruit 
is, and (if asked) would be able to mention central and typical examples: apples and 
oranges, perhaps. However, definition is a different matter. Responses by ordinary lan- 
guage users to requests for a definition are typically over-restrictive: for example, ‘Fruits 
are something you eat, they grow on trees, they are sweet and juicy. Alternatively, a 
botanist might say, more technically, that a fruit is a fleshy entity produced by a flower- 
ing plant or tree (forming from the flowers) and that the seeds that propagate the spe- 
cies are contained within the fruit. Against these two definitions (folk and technica)): 


« Lemons are fruit but they are not sweet. They are juicy; they grow on trees; 
and more to the point, they contain the seeds. But one criterial feature is miss- 
ing: sweetness. This does not stop us classing them as fruit. 

« Strawberries are edible, sweet, and juicy, and are often included as fruit on menus, 
but they do not grow on trees. Botanists tell us that, technically, strawberries are 


4 Counterexamples to the notion that tigers are cats (‘beasts of the cat kind’, Wilkins called them) 
consist only of metaphors. A tiger economy, for example, is (or was) an economy exhibiting fierce 
competitive growth. 
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not fruit. This, however, has nothing to do with the fact that they do not grow on 
trees; it is because of the relationship between the edible part and the seeds by 
which the plant propagates itself. (A strawberry is technically classified as a ‘false 
fruit’ because it forms from the hypanthium, which holds the ovaries.) 

¢ Olives are edible and grow on trees, but they are neither sweet nor juicy. However, 
according to a technical botanical definition, an olive (the fleshy, oily, edible part 
surrounding the seed that enables the tree to propagate) is technically a fruit. 

¢ Are tomatoes fruit? Most varieties of tomato are juicy but they are not sweet, and 
do not grow on trees. Some people would say class them as salad vegetables, but to 
a botanist they are indisputably fruit. 

¢ Acorns are seed-bearing. They are not juicy, not sweet, and not edible (not by 
humans, at any rate, though pigs seem to like them). Nevertheless, technically bot- 
anists would define an acorn as the fruit of the oak tree. 

* Many plants produce a fruit (botanically so defined) that is not edible and in some 
cases is even poisonous. 


Many other examples could be given. This wonderful mismatch between the technical 
definition of fruit by botanists and the meaning of the term among ordinary language 
users is a compelling reason for accepting the notion that a definition must seek to cap- 
ture prototypical criterial features, rather than necessary conditions for set member- 
ship. One conclusion that might be drawn from Geeraerts’s discussion of the meaning 
of the term fruit is that we should not allow ourselves to be bullied into accepting the 
notion that scientists know the ‘true’ meaning of the terms of our language. Both are 
equally true, even though in some respects they are incompatible. The definition writer 
must accommodate both. 

Lexicographers have been slow to respond to the challenges issued by thinkers such 
as Wittgenstein, Putnam, Rosch, and Lakoff, taking refuge in silence—saying nothing 
about the issues raised—or, if challenged, in the defence that dictionary definitions are 
merely idealizations. Hanks (1994, 2006a, 2013), a lexicographer and the author of the 
present article, takes a related but slightly different approach, arguing that dictionary 
definitions can be seen as statements of ‘meaning potential’ rather than of meanings 
tout pur. He observes that fuzziness of word meaning is not merely an ‘imperfection’ 
of natural language, as assumed by thinkers from Wilkins and Leibniz to Frege and 
Russell, but in fact turns out to be a ‘design feature’ of natural language. The vagueness 
of word meaning boundaries in natural languages makes creativity in conversation and 
writing possible, that is, language users can use words in new ways and say new things 
in new circumstances. This means that definitions in dictionaries must be seen as state- 
ments of semantic prototypes. It may, unfortunately, be inevitable that present-day 
readers of a definition (having been brought up in a Leibnizian world with expectations 
of certainties) will see a definition as implying boundaries, but the definer must see it 
differently: the boundaries are inevitably vague and fuzzy and must be acknowledged 
as such. 
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All this leads to at least three conclusions for definers: 


1. Definers should aim to state meaning criteria that are typical conditions, not nec- 
essary conditions. 

2, Creative uses of words by particular writers should not be allowed to affect dic- 
tionary definitions unless there is sufficient evidence that they are established 
conventions. One or two counterexamples are insufficient evidence to deny the 
existence of a convention. 

3. On the other hand conventional figurative expressions can provide evidence for 
differentiae that must be mentioned in definitions. For example, ifa person is said 
to be ‘as cunning as a fox’ or to be ‘a cunning old fox’, and if such an expression 
can be shown to be conventional, the definition of fox must find some way of indi- 
cating that foxes are proverbially cunning, regardless of whether this is an objec- 
tive scientific truth or not. 


7.10 SINCLAIR'S CHALLENGE 
TO TRADITIONAL NOTIONS OF DEFINITION 


In 1987 a new kind of English dictionary for foreign leaners was published. This was 
the Cobuild project (COBUILD is an acronym for ‘Collins Birmingham University 
International Learner’s Database’). Unfortunately, neither Collins publishers nor 
Birmingham University have seen their way to a continuation of the Cobuild research 
programme. John Sinclair, the editor in chief, died in 2003.° 


5 More information can be found ina special issue of the International Journal of Lexicography 
(21:3), September 2008, which was devoted to the lexicographical legacy of John Sinclair. 

As author of the present chapter, I must declare an interest here. I was the managing editor of 
the first edition of Cobuild (1987). In 1990, Ijoined Oxford University Press, with responsibility for 
‘current English Dictionaries’. I designed and—with Judy Pearsall—edited the New Oxford Dictionary 
of English, later re-titled the Oxford Dictionary of English (ODE), the only corpus-based dictionary 
of English for native speakers. It isan account of the present-day meaning of English words based on 
evidence of present-day usage, as well as their historical meanings, but the wording of definitions is 
largely traditional, The decision was taken that the mass market of British dictionary buyers was not 
ready for radical innovations in defining style. 

It should be clear that ODE is very different from the much larger and more famous historical record 
of English that goes under a similar title, namely the Oxford English Dictionary (OED), which, being 
based on historical principles, places the oldest and sometimes obsolete sense of a word first. For 
example, OED defines the word camera first as ‘a vaulted room’ and next as ‘the treasury of the papal 
curia’. ODE’s first definition is ‘a device for recording visual images in the form of photographs, movie 
film, or video signals’. It should also be made clear that ODE is not related to the Oxford Advanced 
Learner's Dictionary of Current English (OALDCE), which takes a very different approach to definition 
and language description, placing a higher value on teachers’ intuitions than on evidence of usage. 
Oxford University Press indeed pays heed to the time-honoured marketing adage, ‘If you are the 
market leader, compete with yourself’ 
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More is said about the Cobuild dictionary elsewhere in this volume, especially in the 
chapters by Heuberger, Moon, and Geeraerts. Here, I will focus on Sinclair’s theoreti- 
cal position vis-a-vis definition, which inevitably had a profound effect on the defining 
style of the dictionary. This theoretical position is best summarized in the following 
quotation: 


Many, if not most meanings require the presence of more than one word for their 
normal realization. (Sinclair 1998: 4) 


If most meanings require the presence of more than one word, then attempting to 
define the meaning of each word in isolation would seem to be a doomed goal. As men- 
tioned above, Hanks (1994) proposes a way out of the difficulty: namely that dictionar- 
ies must be seen as containing, not definitions of word meanings, but rather accounts 
of the meaning potential of words. This sop to the lexicographer’s conscience is all very 
well as far as it goes, but it highlights the necessity of research into the circumstances 
under which each aspect of a word’s meaning potential may be realized. So far, such 
research has not been undertaken on anything like a large enough scale to produce 
convincing results. Cobuild took a tentative step in this direction by adopting a policy 
of explaining words in context (giving prototypical phraseological examples) rather 
than the word in isolation, but there the matter rests. Thus, instead of saying that fly 
means ‘to move oneself in the air by rowing without a solid support’, Cobuild conflates 
the definiendum and the definiens in definitions such as the following: 


2. When a bird, insect, or aircraft flies, it moves through the air. 
and 
3. If you fly somewhere, you travel there in an aircraft. 


Cobuild here is making a distinction between the activity event type and the movement 
event type. It will be obvious to most readers that there is considerable overlap between 
the activity and the movement from one place to another achieved by the activity. But, 
more importantly, Cobuild here is explaining the difference in verb meaning between 
sentences that have fly with a human subject and sentences where the subject denotes 
a bird, an insect, or an aircraft. Distinctions of this sort are made systematically in the 
Cobuild dictionary. However, further research is needed to determine whether in fact 
different phraseological classifications and defining techniques should be used for dif- 
ferent word classes. Hanks (2012) lays out examples showing one possible way in which 
corpus-driven, phraseologically based accounts of the meaning of predicators (verbs 
and predicative adjectives) might differ from those used for nouns. 

Sinclair’s main theoretical contribution was what he called the idiom principle, 
which makes a distinction between the ‘terminological tendency’ of words, accord- 
ing to which they have meanings that relate to the world outside language, and their 
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phraseological tendency, according to which a user’s choice of a word is determined by 
its collocations and syntagmatic preferences. 
At his most provocative, Sinclair says: 


A text isa unique deployment of meaningful units, and its particular meaning is not 
accounted for by any organized concatenation of the fixed meanings of each unit. 
This is because some aspects of textual meaning arise from the particular combina- 
tions of choices at one place in the text and there is no place in the lexicon-grammar 
model where such meaning can be assigned. Since there is no limit to the possi- 
ble combinations of words in texts, no amount of documentation or ingenuity will 
enable our present lexicons to rise to the job. They are doomed. (Sinclair 2004: 134) 


As a matter of fact, it can be argued that ‘our present lexicons’ do quite a good job of 
presenting basic meanings, especially for concrete nouns. What Sinclair has observed 
is that something is missing: our present lexicons fail to deal with aspects of text mean- 
ing that depend on words in combination. This fact has also been noticed by construc- 
tion grammarians in the American tradition (e.g. Fillmore, Kay, and O’Connor 1988; 
Goldberg 1995). 

We must supplement Sinclair's insights by distinguishing ‘possible combinations’ 
from ‘norma! combinations’. Thus, although the number of possible combinations may 
in principle be limitless, as Sinclair says, the number of probable combinations ofeach 
word—its collocational preferences—is limited, and can be grouped arounda few phra- 
seological prototypes. It is surely true that the meaning of a text is ‘not accounted for 
by any organized concatenation of the fixed meanings of each [lexical item]’. This fact 
provides some hope that the number of definitions needed to account for a word’s nor- 
mal uses is not open-ended or intolerably large. However, it also provides a principled 
reason to believe that the goal of defining all possible uses of a word is unachievable. 

Two points may be emphasized here. First, language is highly patterned, and these 
patterned concatenations are reflected in normal usage as found in texts and collected 
in large corpora. Each word is associated with patterns of phraseology that occur again 
and again and again in different texts. Discrete meanings can very often be assigned 
to such patterns with greater confidence than to words in isolation. This should mean 
that definitions that take account of these patterns will be clearer and sharper than 
those that attempt to treat the word in isolation. Unfortunately, with the exception 
of Cobuild, dictionary definitions that account for words in context are few and far 
between. A great deal of work on contextualization of word meaning remains to be 
done. 

Second, the extent to which the ‘fixed meaning of each unit’ varies according to con- 
text is itself a variable. The meaning of some words is highly contextually dependent, 
but other words, especially terms denoting physical objects (e.g. elephant and tooth- 
pick), have a more identifiable meaning in isolation. 

Sinclair (2010, his last work, published posthumously) proposes that lexicogra- 
phers ought to redefine their traditional notion of the definiendum, for in many cases 
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a phraseological entry may be more appropriate than a single word. It is not always 
clear how this should be done. Some form of prototype theory may well be helpful. 
This in turn will entail redefining user expectations. Corpus analysis predicts probable 
usage and meaning; it does not and cannot aim to define all possible uses, and diction- 
ary users will have to accept this as a fact of language, even if they find it unpalatable. 
This implies that there is a massive educational task ahead for language teachers, and 
things are not made any easier by the fact that most of them are currently pointing in 
the wrong direction: that is, they still expect definitions to be statements of necessary 
and sufficient conditions. 

No amount of corpus evidence can tell us what cannot occur, so Sinclair’s approach, 
like that of any corpus linguist, cannot concern itself with all possibilities, but only 
with predicting linguistic events probabilistically. This, though, may be the best that a 
definer can hope to do by way of helping dictionary users to understand the meaning 
of a word. 


7.11 FILLMORE’S FRAME SEMANTICS 


Charles Fillmore has made at least three important contributions to linguistic theory 
with a semantic component: case grammar, frame semantics, and construction gram- 
mar, each of which represents a plank in a possible bridge between syntax and lexical 
semantics. During the development of the FrameNet project, he worked closely with 
the lexicographer Sue Atkins. Here, I summarize aspects of Fillmore’s theory of frame 
semantics that are relevant to definition writing, from its source in case grammar to its 
practical realization in FrameNet.® (See also Geeraerts, this volume.) 

Frame semantics originated in case grammar (Fillmore 1968), in which every verb 
is identified as selecting a certain number of basic cases, which form its ‘case frame’. 
For example, give selects three cases: Agent (the person doing the giving), Benefit (the 
thing given), and Beneficiary (the person or entity that receives the Object); go selects 
two cases: Agent and Path (more specifically, subdivided into Source, Path, Goal); 
break selects three cases: Agent, Patient (the thing that gets broken), and Instrument 
(the object used to do the breaking, for example a hammer). A problem for definers is 
that these cases may appear in different syntactic positions or may even be absent alto- 
gether. Thus, the ‘Patient’ may appear both as the direct object of a causative verb such 
as break and as the subject of the same verb used intransitively (inchoatively): 


(5) Janet broke the cup. 
(6) Thecup broke. 


6 <http://framenet.icsi berkeley.edu/2. More information can be found in a special issue of the 
International Journal of Lexicography (16:3, September 2003), which was devoted to FrameNet and 
frame semantics. 
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‘The ‘Benefit’ and the ‘Beneficiary’ may swap positions in relation to a verb such as 
shower or heap, according as the preposition used is with or on: 


(7) He showered her with gifts. 
(8) Heshowered gifts on her. 


An adequate definition must reflect such alternations. 

In frame semantics, then, frames are conceptual structures involving a poten- 
tially large number of lexical items, not just individual meanings of individual words. 
‘The claim is that to define the meaning of a word satisfactorily, it is necessary to take 
account of the whole bundle of essential knowledge that relates to it. For example, to 
define sell, it is necessary know about the ‘frame’ of commercial transactions, with 
frame elements Seller, Buyer, Goods [alternatively, Service}, and Money. It is also nec- 
essary to state the relations between Money and Goods; between Seller, Goods, and 
Money; between Buyer, Goods, and Money; and so on. 

According to Fillmore and Atkins (1992): 


A word’s meaning can be understood only with reference to a structured back- 
ground of experience, beliefs, or practices, constituting a kind of conceptual 
prerequisite for understanding the meaning. Speakers can be said to know the 
meaning of a word only by first understanding the background frames that moti- 
vate the concept that the word encodes. Within such an approach, words or word 
senses are not related to each other directly, word to word, but only by way of their 
links to common background frames and indications of the manner in which their 
meanings highlight particular elements of such frames. (Fillmore and Atkins 
1982: 76-7) 


Each FrameNet frame is populated by several lexical units and is supported by selected 
corpus lines, which have been annotated. A lexical unit is a pairing of a word with a 
meaning. Frame elements are entities that participate in the frame. Different senses 
of polysemous words belong to different frames. A group of lexical units (words and 
multi-word expressions (MWEs)) is chosen as representative ofa particular frame. For 
each selected lexical unit, a concordance is created from a corpus, and sample concord- 
ance lines are selected and annotated. Labels (i.e. names) are created for each of the 
frame elements. 

Fillmore (2006) discusses the example of the ‘Revenge frame’. The following lexical 
items are identified as participating in this frame: 


verbs: avenge, revenge, retaliate; get even, get back at; take revenge, exact retribution 
nouns: vengeance, revenge, retaliation, retribution 


adjectives: retaliatory, retributive, vindictive 


The frame elements are: 
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Offender (O), Injured Party (IP), Avenger (A) [may or may not be identical to the 
Injured Party], Injury (1) {the offence], Punishment (P) 


‘The relationships (which could be regarded as laying foundations for definitions of the 
terms involved) are summarized as follows: 


O has done I to IP; A (who may be identical to IP), in response to I, undertakes to harm 
Oby P. 


7.12 CONCLUSION 


What, then, should lexicographers aim at, and what precautions should they bear in 
mind when writing definitions for a dictionary? 

‘The first need is to contrast the flexible and unstable nature of meaning in natural 
language with the stipulative definitions of scientific terminology. The meaning of con- 
cepts in science, technology, and other specialist fields can be stipulated precisely, but 
only by using the words of natural language in their normal, conventional senses. The 
prototypical meanings of ordinary words can be accounted for by using other ordinary 
words to identify criterial features of meaning (not necessary conditions for meaning). 

The definer must also bear in mind that every word meaning has the potential for 
creative exploitation, which is not a matter for definition. Thus, distinguishing exploi- 
tations from norms isan essential preliminary in sorting the evidence of uses of a word 
into meaning categories. Most large standard dictionaries fail to make this distinction; 
indeed, it may be said that the larger the dictionary, the greater the number of exploita- 
tions wrongly identified as conventions. Part of the problem is that there is no sharp 
dividing line between normal uses of a word and exploitations of its norms; instead, 
there is a large grey area in which some apparent exploitations may be domain-specific 
or even author-specific norms, while others may be budding or newly emergent norms. 
Good definitions are based on matching the definer’s own linguistic knowledge—his 
or her beliefs about a word’s meaning—with analysis of authentic usage. 

Failure to make a distinction between norms and exploitations can lead to some 
nasty surprises when real-language uses (i.e. corpora and citations) are analysed. 
Examples include OED’s definition of riddle (noun, 4) as ‘a hole made by a bullet’ and 
Merriam-Webster’s definition of newspaper as a verb meaning ‘to do newspaper work’ 
(W3). Both of these definitions are no doubt supported by at least one piece of evidence 
of authentic usage, but insufficient evidence of conventional usage. The OED entry is 
marked explicitly ‘Obs. nonce-w{or]d’, with a single example from 1880, ‘My mother 
... had dropped a tear over the riddle of a bullet in the flap’. An argument of the pre- 
sent chapter is that nonce words do not belong in dictionaries, even large dictionar- 
ies of record such as OED. This is because monolingual dictionary definitions should 
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record only the conventional meanings of conventional words, and one piece of evi- 
dence alone is insufficient to determine the existence of a convention. The definer 
must find evidence of conventionality as well as authenticity. Recording nonce words 
and attributing a meaning to them may have seemed harmless enough in the days 
before James Joyce and the internet, but if the distinction between normal, conven- 
tional usage and exploitations of norms is not rigorously observed in the future, 
dictionaries will run the risk of becoming so overloaded with nonce usages (exploita- 
tions) that it will be impossible to see the wood (conventional meanings) for the trees 
(nonce exploitations). 

Learners’ dictionaries, on the whole, do a better job of identifying conventional 
meaning and use, but (with the exception of Cobuild) they all fall into a different trap: 
the trap of excessive reductionism in the analysis of texts, assuming that each word is 
a unique, isolatable linguistic item with one or more unique and definable meanings. 
Definers of the future will give more careful consideration to the relationship between 
meaning and phraseology: to what extent is the meaning of a given word independent 
of the context in which it is used? 

‘The next question that definers must ask themselves concerns context: phraseology 
and frames. How far shoulda dictionary definition go in taking account of (a) colloca- 
tions (the context in which a word is used) and (b) semantic frames (the real-world con- 
text in which a word is used, whether or not that context is reflected in the collocates 
found in a large corpus)? These are both topics that are fairly new to lexicography. New 
approaches remain to be worked out. 

Questions that writers of traditional and innovative definitions alike must ask them- 
selves include: 


What are the essential qualia that determine the most literal meaning of the 

definiendum? L.e., What sort of thing is it? What’s it for? What’s it made of? 

e How many sense distinctions need to be made? Are they mutually exclusive? Are 
they defined with sufficiently contrastive wording? Should Ockham’s razor be 
applied to reduce the number of definitions? 

e Can the definitions be made shorter and punchier? Ifa definition takes two bites at 
the cherry, do both bites really add something to the reader's understanding, and 
are they both as accurate as can be reasonably expected? 

¢ How does the meaning ofa particular definiendum differ from its close synonyms 
(e.g. how is creep different from crawl)? 

« Can any use be made of scientific investigation of the concept being defined (if any 

has been done)? 


CHAPTER 8 


beeneae oP eee e eS ee er ere erect rer er eree re errereceecererrererrererecerrererererrerrer rere 


EXPLAINING MEANING IN 
LEARNERS DICTIONARIES 


Oe U RSC RCOS ICS O COCO sere Cre rrerirerrerrerrcrerey) Pee Pere eee eTeerererrerererrerererrers OOP eS Perera rery 


ROSAMUND MOON 


8.1 INTRODUCTION 


In Chapter 7, Patrick Hanks examines the nature of the definition, discussing theory 
and practice in monolingual dictionaries. This chapter, something of an appendix to 
Hanks’s, looks at the special case of definitions in monolingual dictionaries for non- 
native speakers, and ways in which definitions are constructed or modified to meet 
the needs of those target users. Since the primary purpose of such dictionaries is peda- 
gogical, lexicographers seek to explain word usage rather than delimit semantics more 
formally, and so, following Hanks (1987),' I use the term explanation in preference to 
definition. 

The different approaches can be seen by comparing these definitions/explanations 
for native speakers (from ODE, 1998)" 


(1) a. contemptuous showing contempt; scornful 


b. contest (noun) an event in which people compete for supremacy in a sport or 
other activity, or in a quality 


c. continue [first two senses] persist in an activity or process; remain in exist- 
ence Or operation 


' Writing about definitions/explanations in the first Cobuild (learners’) dictionary. 

? In entries cited here, I have not replicated exact typography, and I have freely omitted elements 
such as codes and examples. Sample entries (mainly beginning with c) were selected from dictionaries 
ona semi-random basis, with no intention of any systematic critique of particular texts. Where entries 
are cited from electronic dictionaries, these were consulted in April 2013. 
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with these explanations for learners (from OALD7, 2005). Lexis and structures are sim- 
pler, and at contemptuous, scornful is labelled as a synonym rather than forming part 
of the definiens: 


(2) a. contemptuous feeling or showing that you have no respect for sb/sth syn 
SCORNFUL 


b. contest (noun) a competition in which people try to win sth 


continue [first two senses] to keep existing or happening without stopping; to 
keep doing sth without stopping 


The general characteristics and history of learners’ dictionaries are discussed by 
Reinhard Heuberger in Chapter 3. Here my focus is more specific: their representa- 
tion and explanation of meaning. While there are monolingual learners’ dictionaries 
in languages other than English, they remain largely an anglophone phenomenon, and 
I will only consider practices in EFL/ESL dictionaries: that is, for learners of English as 
a foreign or second language. 


8.2 GENERAL PRINCIPLES AND TECHNIQUES 


EFL/ESL dictionaries aim at a global market and assume little about learners’ back- 
grounds beyonda certain proficiency with English, which may be quite closely related to 
their first language (Dutch, German), partially cognate (French, Spanish), or entirely dif 
ferent structurally and conceptually (Chinese, Japanese). This creates problems when for- 
mulating explanations, since they must be understandable without translation. Writing 
for a global market also means that there is little opportunity for addressing asymmetries 
between English and individual languages—unlike bilingual dictionaries, which can 
treat these explicitly. For a word such as know, where several European languages have 
different translations for different kinds of knowing (knowing a fact as opposed to know- 
inga person), this can be explained via polysemous senses, as in OALD7: 


(3) a. 1 to have information in your mind as a result of experience or because you 
have learned or been told it; 


b. 4 tobe familiar with a person, place, thing, etc. 


But most asymmetries are beyond the scope of monolingual explanations. For exam- 
ple, those for eat andlegin OALD7 


(4) a. toput foodin your mouth, chew it and swallow it 


b. one of the long parts that connect the feet to the rest of the body 
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can do little to help German speakers uncertain about the equivalence of eaf and essen/ 
fressen, or Malay speakers and the exact equivalence of kaki and leg/foot. 

The next subsections look at techniques developed for explaining meaning 
to learners: see Hanks (this volume), Atkins and Rundell (2008: 405ff), Landau 
(2001: 153ff), and Svensén (1993: 112ff) for more general discussions of definition 
writing. 


8.2.1 The Language of Explanations 


Examples 2, 3, and 4 demonstrate the first principle of writing explanations for learners, 
that lexis and structures should be simple and clear, even at the expense of precision. 
The tradition of learners’ dictionaries is rooted in pedagogical practice and pedagogi- 
cal needs, originating in work carried out by Palmer, Hornby, and West on vocabulary 
and vocabulary teaching: this included identifying core wordlists for English, and also 
the collocational and grammatical patterns associated with common English words 
(see Cowie 1999, 20092 for a historical account). Hornby et al.’s Idiomatic and Syntactic 
English Dictionary, published in Tokyo in 1942, republished by Oxford University Press 
in 1948 (ISED/ALD)), was informed by extensive experience of EFL teaching, and its 
introduction made defining policy clear: 


Definitions have been made as simple as possible. Where definition in easy, com- 
mon words was not practicable or satisfactory, pictures and diagrams have been 
supplied. (Idiomatic and Syntactic English Dictionary 1948: iv) 


The dictionary text itself drew heavily on COD3 as its basis, as Cowie (1999: 47ff) 
points out. Entries were adapted, and explanations shortened and unpacked, for 
example: 


(s) a. charitable 


Liberal in giving to the poor; connected with such giving; wont to judge 
favourably of persons, acts, & motives. (COD3 1911) 


kind; helping the poor; judging actions in a favourable way. (ISED/A LD1 1948) 
b. chide 


Make complaints, speak scoldingly, (esp. fig. of hounds, wind, &c.); scold, 
rebuke (COD1 1911) 


scold or rebuke, as to chide a pupil for being lazy (SED/ALD1 1948) 
c. chocolate 
(Cake of) cacao-seed paste (COD1 1911) 


a substance made from the seeds of cacao, manufactured with sugar into a 
sweet food, as a bar of chocolate (ISED/ALD1 1948) 
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Seventy years on, ISED/ALD1 seems over-complex itself, and current editions of learn- 
ers’ dictionaries unpack much further. These are from OALD7: 


(6) a. charitable 


2 helping people who are poor or in need 3 kind in your attitude to other peo- 
ple, especially when you are judging them 


b. chide 
to criticize or blame sb because they have done sth wrong syN REBUKE 
c. chocolate 


a hard brown sweet food made from CocoA BEANS, used in cooking to add 
flavour to cakes, etc. or eaten as a sweet/candy: a bar/piece of chocolate 


Explanations are longer and less elliptical, making fewer assumptions about learners’ 
familiarity with words such as scold, rebuke, cacao, manufactured; instead, they use 
more frequent items such as criticize, blame, made, while the non-core item COCOA 
BEANS is printed in small capitals to indicate a cross-reference. 

This simplification of lexis reflects the policy—now implemented in all major 
EFL/ESL dictionaries—of having strictly controlled defining vocabularies, lists of 
2,000-3,000 items through which all headwords and defined phrases or deriva- 
tives have to be explained: if explanations are impossible without additional 
words, these are presented as cross-references. For example, the first twenty items 
in OALD7’s 3,000-word defining vocabulary are a/an, abandon, ability, (un)able, 
about, above, abroad, absence, absent, absolute(ly), absorb, abuse, academic, 
accent, accept, (un)acceptable, access, accident, accidental(ly), accommodation. 
Furthermore, only common, central, or base senses of these high-frequency words 
are used: for example, for academic its educational sense, but not pejorative ‘hypo- 
thetical, unrealistic’. Thus as long as the users of a dictionary are broadly familiar 
with items in its defining vocabulary, they should be capable of understanding all 
its explanations.’ Different dictionaries use different methods to establish their 
defining vocabularies, but typical factors include word frequency, and insights 
from EFL/ESL syllabus design. 

The introduction of a controlled defining vocabulary was one of the great innova- 
tions in the first edition of the Longman Dictionary of Contemporary English (1978): see 
Cowie (1999: 110ff) for discussion. As Cowie says, LDOCE?s defining vocabulary had 
precursors in the work of West and others. However, Hornby et al. explicitly rejected 
the idea in ISED/ALD1, saying: 


3 Applied linguistic research into vocabulary sizes suggests that a core vocabulary of c. 2,000 ‘word 
families’, supplemented by appropriate specialist wordlists, permits understanding of very high 
proportions of texts: see, for example, discussion in Nation 2001. 
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It seemed better to make definitions on the general principle (1) that common words 
should be explained by means of other common words (with the useful addition 
of synonyms which are less common) or by means of pictures and diagrams, and 
(2) that less common words... should be defined by the use of a wider vocabulary. 
UISED/ALD1 1948: v) 


But the success of LDOCE?’s controlled defining vocabulary, which was underpinned 
by pedagogical research, and its popularity with teachers made the case unargua- 
ble: eventually, all rival dictionaries fell into line. There were certainly some awkward 
explanations in its first implementation in LDOCEz, not helped by complex typography 
and punctuation: 


(7) a. chicken1a hen (or perhaps cock! (1), esp. when young but older than a CHICK (1) 


b. clear (verb) 11 infmil to earn (a large amount of money) as CLEAR! (10) profit or 
wages 


But subsequent editions smoothed these out, simplifying structures overall, as in 
LDOCE4: 


(8) a. chicken1 acommon farm bird that is kept for its meat and eggs 


b. clear (verb) 18 informal to earn a particular amount of money after taxes have 
been paid on it 


Typographic design too has been simplified, and there are fewer metalinguistic 
abbreviations.’ 

Current dictionaries also use navigation aids to make long, complex, and dense 
entries more accessible, for example indexes to senses at the beginning of an entry, 
or guide words at each sense (cf. comparable techniques in bilingual dictionaries for 
semantic groupings of translations). For example, MEDAL2’s index or ‘menu’ for the 
verb clear signals seventeen senses (also sections with phrases and phrasal verbs), 
beginning: 


(9) 1emptyaplace 
2 remove sth blocking sth 
3 prove sb not guilty 
4 weather: improve 


5 start to disappear 


4 A pioneer in typographical simplification was the first Cobuild dictionary of 1987: see Moon 
(2007: 173-4) for discussion. 
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LDOCE4 has guide words instead: labels for its first five verb senses of clear are SUR- 
FACE/PLACE; REMOVE PEOPLE; CRIME/BLAME ETC; PERMISSION; WEATHER. Such 
labels provide immediate, crude, indications of general meaning, though they cannot 
substitute for the explanation proper. See Bogaards (1998) and Nesi and Tan (2011) for 
studies of learners and their navigation of long entries. 


8.2.2 Explanatory Techniques 


Explanations from OALD7 and LDOCE4 already cited indicate some of the lexico- 
graphic techniques used in current learners’ dictionaries. Perhaps most obvious is the 
use of paraphrase, often loose, informal and broadly descriptive, in preference to pre- 
cise, substitutable reformulations, or substitutable near-synonyms. Examples (10)-(16) 
are drawn from MEDA L2: 


(10) a. calm 


(adjective) 1 not affected by strong emotions such as excitement, shock, or 
fear 


(verb) 1 to make someone feel more relaxed and less emotional 


b. caribou a large brown animal with long thin legs and horns on its head that 
lives in North America 


The classic definition model of genus+differentiae is still evident, but items used to indi- 
cate genus are general or vague: here not affected, make... feel, animal. Synonym-type 
explanations can be found in learners’ dictionaries, though often for lower-frequency 
headwords or where there are direct equivalents in another variety/register of English; 
some are in the form of cross-references: 
(11) a. careworn looking tired, worried, and unhappy 

b. carryall American a HOLDALL 

c. cascade (noun) 1a small WATTERFALL 


Synonym-type explanations also occur for core items where there is no practical way to 
paraphrase: 


(12) a. capable1 able to do something 


b. case (noun) 1an example or instance of something 


Whereas in traditional dictionaries definitions are formal and impersonal, explana- 
tions in learners’ dictionaries are often informal and seem more interactive, rather as 
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in classroom explanations of word meaning. In particular, you/your may be used for 
human references, and they/them/their for anaphoric links with someone: 
(13) a. cargo things that are being sent by ship, plane, train, or truck 

b. carry yourself to hold or move your body ina particular way 

c. cartoonist someone who draws cartoons, especially as their job 

d. cashpoint a machine that gives you money when you put a bank card into it. 


Traditionally, verb definitions preserve the transitivity of target senses. While learn- 
ers’ dictionaries note and code transitivity (essential information for learners), expla- 
nations deviate from strictly replicating transitivity patterns wherever this improves 
clarity and readability: 


(14) a. carry off (transitive) 1 to deal successfully with something 


b. cash in (intransitive) 1 to use an opportunity to make a profit or gain an 
advantage: ton 


c. characterize (transitive) 1 to be a typical quality or feature of someone or 
something 


d. chat (intransitive) 2 to exchange messages with someone using a computer 
so that you are able to see each other’s messages immediately, especially on 
the internet 


For some words—for example, grammatical words, items with pragmatic functions, 
etc.— explanations may take the form of usage notes, with no attempt to echo the word 
class of the definiendum. Usage-based explanations are discussed in Section 8.4; the 
following simply illustrate techniques: 


(5) a. can (modal verb) 1b. used for saying that you see, hear, feel, taste, smell, 
understand, or remember something: Paul could hear someone calling his 
name 


b. certainly used for emphasizing that something is definitely true or will defi- 
nitely happen 


c. cheer up used for telling someone to try to be happier 


There remains one major explanation technique to discuss, a particularly conten- 
tious technique: the use of full-sentence explanations. These sidestep substitutability 
by explaining items as contextualized usages, rather than as isolated words: 


(6) a. carbonated a carbonated drink has small suBBLEs of air init 


b. cast (verb) 9 ifa snake casts its skin, it slides out of it 
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c. catchy if a tune or phrase is catchy it attracts your attention and is easy to 
remember 


d. caw when crows (= large black birds) caw, they make a loud unpleasant 
sound 


Most major learners’ dictionaries now use this technique sporadically, often in cases 
where words or senses are associated with collocational or other selection or structural 
restrictions. Here a traditional technique would have been to define cast as ‘(ofa snake) 
shed (skin, etc.). 

It is possible to find comparable full-sentence explanations in early dictionaries: 


(77) a. To Calender Linen cloth is to smooth, trim, and give it a gloss, a term used 
by Linen-Drapers. (Blount Glossographia 1656) 


b. Ahareis said, by hunters, to carry, when she runs on rotten ground, or on frost, 
and it sticks to her feet. (Johnson A Dictionary of the English Language 1755) 


However, this technique is more widely associated with children’s dictionaries and folk 
definitions, and informa] or conversational ways of explaining meaning: 


(18) a. cancellf you cancel a football match you arrange for it not to be played. (Neal 
1965, A Sentence Dictionary) 


b. Arent charge is if you like a kind of mini mortgage a kind of mini mortgage 
[sic] on my land whereby I own the land but I have a charge on i[t]. (Bank of 
English corpus, spoken interaction) 


With respect to learners’ dictionaries, full-sentence explanations are particularly asso- 
ciated with the Cobuild project (1980s-) led by John Sinclair: the technique was applied 
wholesale in its first learners’ dictionary of 1987 and in subsequent publications. That 
is, all explanations of word meaning were formulated as full sentences, irrespective of 
whether or not there were special lexicographical patterns or collocational restrictions 
to incorporate, or pragmatic usages to describe. These are from Cobuild:: 


(19) a. calf3 Your calf is the thick part at the back of your leg between your ankle 
and your knee. 


b. calico is plain white fabric made from cotton. 


> The Bank of English corpus was created by Cobuild at the University of Birmingham. 
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c. call (verb) 


1 If someone or something is called a particular thing, it is their name or 
title. 


14 If you call on someone, you make a short visit to see them or to deliver 
something. 


d. callow A young person who is callow has very little experience or knowledge 
of the way they should behave as an adult. 


e. cap 3 You say ‘If the cap fits’ to someone when you mean that they can take a 
remark as applying to them if they feel that it is appropriate. 


f, coo 
1 When a dove or pigeon makes a soft sound, it coos. 


3 People sometimes say coo when they are surprised or impressed by 
something 


g. crunchy 
1 Food that is crunchy is hard or crisp so that it makes a noise when you eat it. 


2 Gravel or snow that is crunchy makes a noise when you step on it. 


Typical explanatory patterns include the use of interactive you/your, structures with if/ 
when, and the formula you say... you mean. 

Cobuild: was also the first dictionary to be written entirely on the basis of a corpus 
(in the modern sense): that is, lexicographers wrote each entry ab initio from corpus 
evidence. This evidence foregrounded the phraseological patterning associated with 
individual words, senses, and phrases, which not only disambiguated polysemous 
words, or foregrounded semi-lexicalized usages, but pointed up lexico-grammatical 
constraints on word choice over and above syntactic structure or selection restric- 
tions: meaning and phraseology were in fact inseparable. Thus the adoption of con- 
textualized explanations, in lieu of traditional definitions, was both a response to 
the evidence and a realization, to paraphrase Hanks, that dictionaries had ‘got the 
equation wrong’ by placing co-textual and contextual information as part of the 
definiens, not the definiendum: thus in the context of crunchy, explanations need to 
relate to ‘crunchy food’ and ‘crunchy gravel/snow’, not crunchy as an abstract con- 
cept. See Hanks (1987), Sinclair (1991), and Barnbrook (2002, 2013) for discussion of the 
principles behind Cobuild explanations and their structures; see Moon (2007, 2009) 
for overviews of Cobuild lexicographical practice; see Hanks (this volume) for fur- 
ther discussion of full-sentence definitions, including theoretical and philosophical 
perspectives. 

Cobuild’s explanatory techniques had critics, who commented on awkward struc- 
tures, rambling explanations, and the absence of a controlled defining vocabulary 
(though in practice most explanations used words from a core vocabulary of about 
2,500 items); also, the use of full-sentence explanations for very simple nouns such 
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as calico (19b) was seen as redundant. But Cobuild’s essential principles, that items 
should be explained within their phraseological contexts, that entries should repre- 
sent the kind of information so conspicuous in corpus evidence, and that dictionary 
conventions, structures, and design needed to be simplified, had a profound influ- 
ence on the so-called third generation of learners’ dictionaries in the mid-1990s, 
and the way in which phraseologies and meanings came to be presented—includ- 
ing the use of full-sentence explanations. See Rundell (2006) for discussion of full- 
sentence explanatory techniques and their advantages and disadvantages; see also 
Cowie (1999: 148ff) for a critique; and Chan (2014) for a recent user study, comparing 
techniques, 


8.2.3 Pictorial Explanations 


An important explanatory technique in children’s dictionaries is ostention, whereby 
the meaning of items is shown through illustrations and diagrams as well as words. 
Learners’ dictionaries also usually provide illustrations: line drawings, etc. within the 
main dictionary text, and/or group illustrations in end matter or other inserts. Clearly, 
simple illustrations for common and complex real-world entities can be preferable to 
discursive explanations, or at least provide useful adjuncts: for example, OALD7’s first 
two pages of words beginning C include pictures of cactuses and a cafetiére to help 
locate the meaning of the verbal explanations: 


(20) a. cactusa plant that grows in hot dry regions, especially one with thick sTEMs 
covered in SPINES but without leaves. There are many different types of 
cactus. 


b. cafetiére a special glass container for making coffee with a metal FILTER that 
you push down 


These pages also have cross-references from cab to an illustration at truck, and from 
cabbage and cabin to grouped full-colour illustrations for FRUIT/VEGETABLES and 
BUILDINGS in an appendix. Illustrations can also provide useful vocabulary-building 
information: for example, MEDAL2’s in-text drawing for camera not only shows what 
a typical (non-digital) camera looks like, but labels parts such as viewfinder and lens. 
Some illustrations, however, can be distracting rather than helpful, or seem intended 
mainly to break up dictionary text: MEDAL2’s drawing for cage depicts an old-fash- 
ioned domed birdcage, while camouflage has a cartoon of someone disguised asa tree in 
a wood. While the clearest illustrations are those for physical objects, dictionaries some- 
times also provide illustrations for actions/activities, positions/directions, emotions, 
and so on: for example, LDOCE4’s illustration of suRFACEs depicts adjectives such as 
prickly, fluffy, shiny, greasy by drawings of, respectively, a cactus, a kitten, a pair of shoes, 
and a dirty stove. (However, such illustrations can sometimes be misleading or open 
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to misinterpretation.) Illustrations in learners’ dictionaries may also be culture-bound, 
reflecting anglophone ways of life and so not necessarily accessible to or recognizable by 
learners from other cultures. See Kiosa (this volume) for an overview of illustrations in 
dictionaries and Heuberger (this volume) fora critique of those in learners’ dictionaries; 
see also Ilson (1987) for a typology of illustrations with particular reference to learners’ 
dictionaries. 


8.3 EXPLAINING MEANING IN CONTEXT 


The explanations cited above mostly omit the abundance of information which 
learners’ dictionaries provide in terms of syntax, collocation, usage, synonyms/ 
antonyms, etc., and examples. However, this information helps to contextualize and 
explain meaning, both in decoding and encoding, and is discussed in the following 
subsections. 


8.3.1 Phraseology in Explanations 


The current practice in learners’ dictionaries is for phraseology to be represented as 
fully as possible: if appropriate, the typical co-texts of individual words, senses, and 
phrases are incorporated into explanations.° Of course, some words can be conveni- 
ently and satisfactorily explained in comparative isolation, as in these from LDOCE4, 
which are unaccompanied by examples or anything beyond basic grammatical class 
and register: 


(21) a. calcify v [I, T] technical to become hard, or make something hard, by adding 
LIME 


b. camembert n [U, C] a soft French cheese that is white outside and yellow 
inside 


c. camera-shy adj not liking to have your photograph taken 


But many words are not so simple, and so in learners’ dictionaries there are now large 
numbers of entries with extensive information about phraseological patterning, fore- 
grounded and exemplified, which enable them to provide more nuanced explanations 
of word meaning in context. In LDOCEg, these patterns are in bold: 


6 In this respect, they reflect Firth’s point about ‘meaning by collocation’, later implemented 
lexicographically through the work of Sinclair: see Firth (19574), Sinclair (1991), and an overview in 
Moon (2008). 
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(22) a. calculation n 11 [C usually plural, U] when you use numbers in order to 
find out an amount, price, or value: make/do a calculation Dee looked at the 
bill and made some rapid calculations. | by somebody’s/some/many calcula- 
tions By some calculations, the population will reach 8 million soon. 


b. chilln 


1 [singular] a feeling of coldness: There was a slight chill in the air. | morn- 
ing/autumnal/January etc chill Suddenly aware of the morning chill, she 
closed the window. | [+ of] He sat in the chill of the evening, staring out over 
the city below. | I turned on the heater in the hall to take the chill off the 
house (=to heat it slightly). 


2 [C] a sudden feeling of fear or worry, especially because of something cruel 
or violent: The sound of his dark laugh sent a chill through her. | chill of fear/ 
apprehension/disquiet etc Fay felt a chill of fear as she watched Max go off 
with her daughter. | There was something in his tone that sent a chill down 
Melissa’s spine (=made her very frightened). 


Such entries not only show the lexico-grammatical patterns associated with different 
senses (and inseparability of meaning and phraseology) but better reflect the semantic 
continuum that exists between simplex words and multi-word items. In contrast, tra- 
ditional dictionaries are obliged by placement conventions and constraints to present 
words and phrases as typologically discrete. 


8.3.2 The Role of Examples 


Entries like those just cited show clearly the part which examples play in the expla- 
nation of meaning. Glossed examples, as with take the chill off and send a chill down 
provide information about particular contexts of use; other examples, whether dem- 
onstrating structures or lexical collocates, provide models for appropriate encod- 
ing, rather like the translated examples in bilingual dictionaries. In ISED/ALD1 some 
explanations are fused with examples (and some examples seem hortatory): 


(23) a. careless 2 light-hearted and cheerful, as careless little singing-birds. 3 (of 
actions) done or made without care, as a careless mistake. 


b. chew move (food, etc.) about between the teeth or in the mouth, You should 
always chew your food well before you swallow it. 


Incurrent editions of learners’ dictionaries, many examples are chosen to demonstrate 
phraseological patterns; however, some are included simply to clarify explanations, as 
in these for high-frequency/core senses and metaphorical uses: 
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(24) a. car1aroad vehicle for one driver anda few passengers... She got into her car 
and drove away. (MEDAL2) 


b. carry 1 to support the weight of sb/sth and take them or it from place to 
place; to take sb/sth from one place to another: He was carrying a suitcase. 
) She carried her baby in her arms, ) The injured were carried away on 
stretchers. (OALD7) 


c. carpet 3 ~ (of sth) (literary) a thick layer of sth on the ground: a carpet of 
snow. (OALD7) 


d. carve 3 if wind, a river, etc. carves something in the land, its action forms 
it over a period of many years: The river has carved a series of spectacular 
gorges. (MEDAL2) 


Some learners’ dictionaries have used citations as examples to demonstrate meaning 
more fully in actual contexts of use. For example, Cobuild lexicographers were fully 
committed to the inclusion of corpus examples, minimally edited or unedited, as with 
these from Cobuild: 


(25) a. careen If someone or someone careens somewhere, they rush forward in 
an uncontrollable way... The truck sways wildly, careening down narrow 
mountain roads. 


b. careful 3 If you tell someone to be careful about doing something, you think 
that what they intend to do is probably wrong... I think you should be careful 
about talking of the rebels as heroes. 


c. caretaker 2 A caretaker government or leader is in charge temporarily until 
a new government or leader is appointed. The military intends to hand over 
power to a caretaker government and hold elections within six months. 


8.3.3 Ancillary Information about Meaning 


Other features in learners’ dictionaries provide further support for explanations of 
meaning, and a case in point is the inclusion of synonyms and antonyms. Some sim- 
ply map standard lexical relations, as when OALD7 offers careless as an antonym for 
careful, and careful for careless (both items within their defining vocabulary). But 
elsewhere synonyms, possibly already familiar to learners, supply alternatives to or 
expansions of periphrastic explanations, as in these from OALD7: 


(26) a. calamity an event that causes great damage to people's lives, property, etc. 
SYN DISASTER 
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b. calculate to use numbers to find out a total number, amount, distance, etc. 
SYN WORK OUT 


c. callous not caring about other people's feelings or suffering syN CRUEL, 
UNFEELING 


More specific information about near-synonyms, especially of core vocabulary 
items, is sometimes provided in print dictionaries within boxed features in the 
alphabetical text (in online versions, such features are linked to headwords or 
search items). Whereas traditional thesauruses only list words, these more discur- 
sive features provide contextualized explanations and, through adjacency, indicate 
distinctions in meaning and usage. Hence a feature in MEDAL2 at change (verb) 
addresses the set change, alter, adjust, adapt, convert, modify, transform, vary; one 
in OALD7 at campaign addresses the set of nouns campaign, battle, struggle, drive, 
war, fight. Compare too OALD7’s ‘which word?’ features for confusable items, as in 
this for calm/calmness: 


(27) The noun calm is usually used to talk about a peaceful time or situation: There 
was a short period of uneasy calm after the riot. It can also be used to describe a 
person’s manner: She spoke with icy calm. Calmness is usually used to talk about 
a person: We admired his calmness under pressure. 


Some learners’ dictionaries have additional features on word meaning as part of 
mid- or end-matter, helping learners to develop and deepen their understanding of 
English lexis. MEDAL2’s mid-matter has an extended feature on writing skills with 
much word-based information about the expression of ideas: for example, it pro- 
vides information about the usage of moreover/furthermore/besides, and about words 
indicating similarities (resemblance, similarity, analogy, parallel, correspond). A 
further extended feature ‘Expand your Vocabulary looks at items expressing com- 
munication, emotions, and movement and their meanings: for example, a subsection 
‘To walk quietly’ explains creep, tiptoe, pad, sneak. Another mid-matter article in 
MEDAL2 addresses conceptual metaphor (cf. Lakoff and Johnson 1980), and this is 
keyed into a series of features in the alphabetical text with sets of metaphorically- 
linked words and phrases: see Moon (2004) for discussion. ‘This extract is from an 
entry at confused in MEDAL2: 


(28) Being confused is like being lost or being in the wrong place or position. 


You've lost me. What do you mean? ¢... I felt adrift and alone, with no real sense 
of direction. ¢... I don't know if I'm coming or going. #... You’ve got it all back te 


front. 
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8.4 EXPLANATIONS FOR SPECIAL KINDS 
OF WoRD 


8.4.1 High-frequency Words 


A particular lexicographical challenge is how to explain central items in the English 
lexicon, including those in the defining vocabulary, items with which learners are 
expected to be familiar already. Explanations for these become place-holders, since it 
is difficult to construct paraphrases which are not harder to understand than the items 
being explained, nor so vague as to be meaningless. The following explanations are 
from OALD7, but all learners’ dictionaries find such words problematic, and catch and 
throw are particularly intractable: 


(29) a. catch1to stop and hold a moving object, especially in your hands 


b. throw 1 to send sth from your hand through the air by moving your hand or 
arm quickly 


c. central1 most important 3 in the centre of an area or object 


d. chance1 a possibility of sth happening, especially sth that you want 


Where grammatical items are concerned, many would argue that these cannot 
be treated adequately in dictionaries but belong instead in grammar textbooks 
(which is where learners would probably turn for advice). These entries tend to 
be place-holders too, with ‘senses’ often having usage-based explanations as in 
example (15a) above, and perhaps tied to extended mid-/end-matter discussions 
of grammar, or other special features (e.g. one in OALD7 contrasts the modals 
can and may). The following taken from LDOCE4 are representative of strategies 
adopted: 


(30) a. aalso an indefinite article, determiner 1 used to show that you are talking 
about someone or something that has not been mentioned before 


b. about prep 1 concerning or relating to a particular subject 

c. although conjunction 1 used to introduce a statement that makes your 
main statement seem surprising or unlikely 

d. anything pron 1 any thing, event, situation etc, when it is not important to 
say exactly which 
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8.4.2. Evaluative and Cultural Meanings 


Learners need information about evaluative orientations and connotations of items in 
order to decode subtext and encode appropriately. This is sometimes left implicit in the 
wording of the explanation: 


(31) a.  caprice 1 a sudden and unreasonable change of mind or behaviour 
(LDOCE4) 
b. carry-on 1 BrE spoken a situation in which someone behaves in a silly or 
annoying way (LDOCE4) 
c. charlatan a person who claims to have knowledge or skills that they do 
not really have (OALD7) 
d. colonialism a situation in which one country rules another (MEDAL2) 


‘The pejorative meanings of the first two are clear, but those of the third and fourth are 
not stated or explained. Safer strategies occur elsewhere, using labels such as approving 
or disapproving, or usage-based explanations: 


(32) a. childish 2 (disapproving) (of an adult) behaving in a stupid or silly way 
(OALD7) 


b. calculating If you describe someone as calculating, you disapprove of the 
fact that they deliberately plan to get what they want, often by hurting or 
harming other people. (Cobuild2) 


Similarly, cultural associations may be described overtly, or left implicit, perhaps in 
examples. These from MEDAL2 havea clear pedagogical intention: 


(33) a. cardigan a jacket KNITTED from wool, that you fasten at the front with but- 
tons or a zip. Cardigans are usually thought of as an old-fashioned rather 
boring piece of clothing, worn mainly by older people. 


b. Christmas dinner a traditional meal eaten at Christmas, in the UK often 
consisting of TURKEY (= a large bird) with vegetables, followed by a heavy 
fruit PUDDING called Christmas pudding 


c. curry anIndian food consisting of meat or vegetables cooked in a sauce with 
ahot flavour, often eaten with rice: a chicken/lamb/vegetable curry Curry is 
rapidly becoming the UK’s favourite dish. 


See Moon (2014) for a discussion of learners’ dictionaries in relation to culture and 
their treatment of ideologically loaded items. 
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8.4.3 Explanations of Pragmatics 


Where words or phrases have particular textual or situational functions—-for exam- 
ple, uses as discourse markers, or as greetings, apologies, etc.— explanations need to 
show their pragmatics. Sometimes dictionaries simply offer alternative items as expla- 
nations, or leave functions implicit in examples; sometimes they provide usage-based 
explanations, with formulae such as ‘used for saying’, ‘used when you ...’, ‘people some- 
times say... when. ..’,andso on. These MEDAL2 entries use mixed techniques: 


(34) a. cheers 1 used for expressing good wishes when holding a glass of alcohol, 
just before you drink it 2 British informal thank you: ‘Here’s that book you 
wanted to borrow. ‘Oh, cheers.’3 British informal goodbye 


b. the contrary the opposite: Evidence suggests that the contrary is true. 
@ quite the contrary: I don’t disagree—quite the contrary—I think you're 
absolutely right. 

PHRASES on the contrary used for emphasizing that something is true, 
even though it is the opposite of something that has been said: The risk of 
infection hasn't diminished—on the contrary, it has increased. 


to the contrary making you think that the opposite may be true: Despite 
all evidence to the contrary, he believed his plan would succeed. 


c. cordially formal in a cordial way a. used in formal invitations: Staff and 
students are cordially invited to the ceremony. 


MEDAL2 further offers a mid-matter section on pragmatics, and other learners’ dic- 
tionaries similarly draw attention to pragmatic usage, sometimes with special features, 
and/or labels at individual senses in the main dictionary text: for example, Cobuild2 
used the generic abe] PRAGMATICS, though later editions adopted flags such as empha- 
sis or vagueness instead, 

All such pragmatic information is important for decoding and encoding, but none 
as important as the labelling of potentially offensive and taboo words: learners’ dic- 
tionaries have to be unambiguous about social norms. Typical labels used include 
impolite, insulting, offensive, taboo, perhaps incorporated into usage-style explana- 
tions, and occasionally with advice on alternative encodings, either in entries or special 
features—these examples are from OALD7: 


(35) a. coolie (old-fashioned, taboo) an offensive word for a worker in Eastern coun- 
tries with no special skills or training 


b. crap (taboo, slang) 2 something of bad quality: This work is complete 
crap.... HELP More acceptable words are rubbish, garbage, trash or junk. 
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8.4.4 Technical Words 


Finally, technical terms: these often, though not always, have direct equivalents in 
learners’ first languages (so bilingual dictionaries or term banks might seem more 
appropriate and helpful as sources for information). Yet learners’ dictionaries still 
include items if they are reasonably frequent in the English lexicon or are part of the 
specialist lexis used by learners. The metalanguage of education and language study are 
cases in point, as in these from LDOCE4: 


(36) a. cloze test a test in which words have been removed from a short piece of 
writing, and students have to write what they think are the correct words in 
the empty spaces 


b. collocation technical the way in which some words are often used together, 
ora particular combination of words used in this way: ‘Commit a crime’ is a 
typical collocation in English. 


Particularly problematic are scientific and other terms, where explanations, couched 
in defining vocabulary, have to explain concepts and phenomena rather than lexical 
items and usage. Explanations like these often rely on subject labels and cross-refer- 
ences, and sometimes secondary sentences. The following are from MEDAL2: 


(37) a. camshaftabarinan engine, fixed toacaM 


carbohydrate BIoLoGy a substance found in foods such as sugar, bread, and 
potatoes. Carbohydrates supply your body with heat and energy. 


c. care order LEGAL a legal arrangement in the UK for the local Social 
Services to look after a child instead of the child’s parents 


d. cathode cHEMIsTRY the negative ELECTRODE in a BATTERY Or similar piece 
of electrical equipment, or the NEGATIVE electrode in an ELECTROLYTIC 
CELL 


e. chiaroscuro ART the way that light and dark areas create a pattern, espe- 
cially in drawings and paintings 


8.5 FURTHER ISSUES AND DEVELOPMENTS 


A key question with learners’ dictionaries is whether they actually help learn- 
ers improve proficiency in English, both decoding and encoding: are monolingual 
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explanations of meaning more effective than translated equivalents? Over the last 
thirty or so years, multiple surveys of dictionary use have sought to answer this ques- 
tion. Béjoint (2010: 223ff) provides an overview of findings: one important conclusion is 
that ‘monolingual dictionaries are mostly used for meaning’ (2010: 243). Interestingly, 
there seems no consistent correlation between learners’ preferences for certain types 
of dictionary explanation, including full-sentence explanations, and their success in 
decoding/encoding (2010: 251). See Béjoint (2010) for summaries of the most important 
studies; also Béjoint’s own seminal paper of 1981, anda special issue of the International 
Journal of Lexicography edited by Lew (20112). 

To explain meaning monolingually for non-native speakers is inherently problem- 
atic, yet explanations can show meaning in context and explain nuances and struc- 
tures in ways that purely bilingual dictionaries cannot. ‘Bilingualized’ dictionaries 
offer a kind of compromise. These are often based on monolingual learners’ dictionar- 
ies and are unidirectional, that is, from English into the other language. Monolingual 
explanations are typically accompanied by translations of the headword, with one or 
more equivalents listed; however, different techniques and degrees of bilingualiza- 
tion are found. For example, the following excerpts from entries for cape ‘promon- 
tory’ and calm (of weather or the sea) are taken from three bilingualized versions of the 
Collins Cobuild Student’s Dictionary (1989), respectively Brazilian Portuguese (which 
simply translates the original text, apart from the English headword, without giving 
the Portuguese equivalent—arguably the information that learners would most like), 
Czech (which gives the original English, Czech translation, and then the Czech equiva- 
lent), and Finnish (which gives the original text or a light adaptation, and the Finnish 
equivalent of the relevant headword or sense). The English original is given first: 


(38) a. Acapeisa large piece ofland that sticks out into the sea. 
A cape é uma grande ponta de terra que entra pelo mar. 


A cape is a large piece of land that sticks out into the sea. + Cape je velky 
kus zemé, ktery vyéniva do mofe. ¥ mys. 


d. Acapeisa large piece of land that sticks out into the sea. « niemi 


If the weather or sea is calm, there is no wind and so the trees are not mov- 
ing or the water is not moving. 


= 


(39) 


b. Seo tempo ou o mar estado calm, nao ha vento e portanto nem a 4gua nem 
as arvores estao se mexendo. 


c. If the weather or sea is calm, there is no wind and so the trees are not 
moving or the water is not moving. + Jestlize je pocasi nebo mofe calm, 
nefouka zadny vitr, takze se nepohybuji stromy ani se nepohybuje voda. ¥ 
klidny. 

d. Ifthe weather is calm, there is little or no wind. If the sea is calm, the water 
is not moving very much. « tyyni 
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See Cowie (1999: 192ff) for an overview of bilingualized learners’ dictionaries; see also 
discussions by Adamska, Fontenelle, and Heuberger (all this volume). 

Learners often find it difficult to locate information in traditional paper dictionar- 
ies. Electronic formats—web-based, apps, etc.— offer improvements: see Nesi (this vol- 
ume) for discussion. In particular, different kinds of information about meaning may 
be presented more clearly in visual terms, or integrated with parallel bilingual entries, 
thesaurus entries, and so on. At the time of writing, the main British advanced learn- 
ers’ dictionaries are available free online. While entries are based largely on recent print 
editions, facilities offered include hyperlinks to words in explanations—that is, click- 
ing on an unfamiliar definition word accesses the entry for that word (not necessarily 
the right sense). Most offer vocabulary extension material: Longman has ‘topic diction- 
aries’ and user guides; Oxford links to entries with illustrations, usage notes, defining 
vocabulary, and blogs. Macmillan also offers blogs, and integrates thesaurus material: 
for example (in April 2013), chat ‘talk’ has links toa list with catch up, rap, make conver- 
sation, etc. (along with definitions), and the computer-mediated sense links to words 
such as bookmark, google, flame, post. Its dictionary entry for cheers is similarly linked 
to ways of saying goodbye or thanking, and, for its usage as a toast, to a series of items 
‘relating to or associated with drinking alcohol’, including down the hatch and here's 
to... (as wellas AA, abuse, alcoholism, delirium tremens, hangover). 

While electronic formats allow greater possibilities for learners’ dictionaries and 
better explanations of meaning, new media create other challenges, not least to the eco- 
nomic models which over the last seventy years, and last thirty years in particular, have 
sustained publishers’ investments in the development of new learners’ dictionaries: 
costly corpus research, lexical analysis ab initio, and wide-ranging user research. The 
web has also brought in lexicographical democratization, ranging from crowd-sourced 
and collaborative dictionaries such as Wiktionary and Urban Dictionary to publishers’ 
own invitations for contributions—words of the moment, favourite words, and the like. 
Urban Dictionary is idiosyncratic, linguistically unreliable, and often offensive (as well 
as amusing): Wiktionary is a more serious enterprise though its lexicographical model 
seems somewhat dated and its entries erratic. Most relevant for learners’ dictionaries is 
Simple English Wiktionary, ‘an online dictionary that uses simpler words so it is easier 
to understand’ and intended to ‘be easier to read by people who do not speak English 
well’ (<http://simple.wiktionary.org/wiki/Main_Page>). It adopts various explanatory 
techniques, including full-sentence explanations and illustrations, and provides exam- 
ples to clarify meaning (mostly omitted below). In the following, underlining indicates 
that words are hyperlinked to other entries: 


(40) a. calculate (verb) 


1 (transitive) If you calculate something, you find its value, usually by using 
mathematics, 
4 (intransitive) If you calculate on something, you plan on it or expect it. 
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b. calm (adjective) 


1 If a person, place or situation is calm, it is peaceful and quiet. There’s no 
wind and the lake is calm. I was angry a minute ago, but now] am calm. 


c. chair 
1 A chair isa piece of furniture for one person to sit on. 
2 The person who is in charge ofa meeting is the chair of the meeting. 
3 A chair is a position of professor in a university. 


Other entries explain more traditionally: 


(41) a. cactus A type of plant that has spines instead of leaves and grows in dry 
places. 


b. calcium An element with atomic number 20 and symbol Ca. Also a nutrient 
in many different foods. 


At the time of writing (April 2013), spines and nutrient in these explanations are linked 
into invitations to contribute new entries for those words, which are as yet undefined. 
Explanations for chair 1 and cactus are accompanied by photographs, calcium by a 
diagram of the periodic table with Ca highlighted. Entries for chair and calcium are 
also linked to entries in Simple English Wikipedia, as is calm, where the encyclopaedic 
information includes etymology, glosses in relation to weather and sea conditions, and 
a seascape by Manet ‘Calm Weather’. 

Asof April 2013, Simple English Wiktionary offers only partial coverage of the lexicon, 
with just under 20,000 entries, but there are hints of its immense potential, for example 
via extensive systems of links between different texts, media, and kinds or levels of lexi- 
cal information. It will be interesting to see how it develops and evolves: whether the 
opportunities for interactions between its amateur lexicographers, software designers, 
target users, and teachers lead now to more dynamic and creative ways of explaining 
meaning for learners; also, what effect this will have on conventional publishers and 
the learners’ dictionaries that they produce, especially where they too engage more 
directly and interactively with their users. 
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g.1 THE CENTRALITY OF MEANING 
IN BILINGUAL LEXICOGRAPHY 


WHEN people consulta dictionary, they most often do so not in order to check a word's 
spelling or to learn about its grammatical properties, but because they want to find 
out what the word means. This is true in particular of bilingual dictionaries whose 
source language (SL) is a foreign language (L2) for the dictionary user and whose tar- 
get language (TL) is either the user’s native language (L1) or a foreign language better 
known to them than the dictionary’s source language.' Such dictionaries are expected 
to explain the meaning of each SL item, ideally through a TL equivalent. 

When the source language of the dictionary is the user's native language, a typical 
aim of consultation is to establish what foreign-language word or expression will be 
the best translation of a given mother-tongue item. Accordingly, an Li-L2 dictionary 
is expected to suggest ways of rendering an L1 item in L2, again ideally by an L2 equiva- 
lent. It follows that the function of explaining meaning is more relevant to the L2-L1 
than the Li-L2 dictionary: in the case of the latter, understanding what the SL (L1) item 
meanscan be taken for granted. Nevertheless, there, too, it is commonality of meaning 
which links the Li headword and the proposed L2 equivalent(s). Whichever subtype of 
bilingual dictionaries we look at, then, meaning seems to be of primary importance for 
both their users and compilers. 


1 For simplicity’s sake, in the following we concentrate on cases when one of the dictionary’s 
languages is the user’s mother tongue. It should be kept in mind, however, that most of what is said 
here about L2-Li and L1~-L2 dictionaries is equally applicable to situations when neither of the object 
languages ofa bilingual dictionary is the native language of the user, but one of them is more familiar 
(‘less foreign’) than the other and can thus serve as a vehicle of meaning transfer. 
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In accordance with what has just been said, metalexicographers assume that an 
L2~-Li dictionary is meant mainly for reception (broadly speaking: understanding 
texts in a foreign language), while the primary task of an Li-L2 dictionary is to serve as 
an aid in the user’s own foreign language production. Atleast, thisis the theory. In real- 
ity, most bilingual dictionaries have to serve two speech communities at once, the same 
dictionary acting as an L2-Li dictionary for one group of users and an Li-Lz2 diction- 
ary for the other. For instance, a particular English-Urdu dictionary can be an L2-Li 
dictionary for speakers of Urdu and, at the same time, an Li-L2 dictionary for speakers 
of English; in the terminology proposed by Hausmann and Werner (1991), such dic- 
tionaries are known as bidirectional. 

Monodirectional dictionaries, that is, those addressed specifically to one group of 
users (e.g. an English-Urdu and/or an Urdu-English dictionary aimed exclusively at 
native speakers of Urdu), are less viable economically, especially when one of the lan- 
guagesis less widely spoken and less often learnt than the other. In such cases, publish- 
ers opt either for a bidirectional dictionary or for a monodirectional one targeted at the 
community speaking the less popular language.” Fortunately for the general public, 
recent years have witnessed a considerable increase in the number of monodirectiona] 
dictionaries published worldwide. 

Partly for this reason, but mainly with clarity of presentation in mind, the follow- 
ing discussion will focus on monodirectional dictionaries. It will also, to some extent, 
privilege La-L: dictionaries. These not only appeared earlier on the European scene,° 
but are still, arguably, consulted more readily, people being more prone to turn to a 
dictionary when they do not understand a foreign word or expression than when they 
want to express themselves in a foreign language (for which task they may choose to 
rely on their existing lexical repertoire, even if it means sacrificing precision and idi- 
omaticity of the resulting utterance). 

In the history of theoretical reflection on bilingual dictionaries, one scholar stands 
out as a particularly vocal advocate of the claim that the primary task of a bilingual 
L2-Li dictionary is to explain the meanings of L2 lexical items to speakers of L1: the 
Russian linguist and lexicographer Lev Séerba. Dissatisfied with existing bilingual 
dictionaries—which he dubbed translation dictionaries—Séerba (1940) argued for a 
new type of L2-L1 reference work, a so-called explanatory dictionary. Instead of giv- 
ing Li translations, which, in his view, inevitably misrepresented the meaning of SL 
headwords, an explanatory dictionary would provide Li explanations of meaning. It 
would thus resemble a monolingual dictionary in that it would offer definitions, with 


2 Quite often, such a dictionary is advertised as being designed to serve both communities, whereas 
in fact it has been written with the ‘more interested’ community in mind. In any case, serving both 
communities equally well seems to be impossible, at least in print dictionaries (Atkins 1992/3: 31n.). 

> This can be explained by the fact that the foreign language dealt with in medieval dictionaries 
was almost uniformly Latin, and people naturally had more need of understanding the meanings of 
Latin words and expressions than of expressing themselves in Latin. But even later, when dictionaries 
linking pairs of vernacular languages started to be compiled, La—Li dictionaries tended to precede 
their Li-L2 counterparts. 
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the important difference that those definitions were to be formulated in the diction- 
ary’s target language (at the same time, the native language of the dictionary’s intended 
users). 

Scerba’s ideas apparently failed to appeal to practising lexicographers, for very few 
dictionaries have been produced in accordance with his principles.* Today’s bilingual 
dictionaries are still predominantly instances of the type Scerba called a translation 
dictionary, that is, dictionaries in which SL items are furnished with their TL lexical 
counterparts. More precisely, since most words in most languages are polysemous, it 
is the individual senses of words and expressions for which TL counterparts are given. 
What qualifies as such a counterpart (known in lexicography as an equivalent) is by 
no means obvious. There are at least two major reasons for that: the nature of meaning 
itself and the lack of interlingual isomorphism. 


9.2 The Nature of Meaning 


Meaning is an elusive phenomenon. Contrary to popular belief, there are no discrete 
meanings ‘out there’ which it would be the lexicographer’s task simply to identify and 
describe: words exhibit a meaning continuum, with different meanings shading into 
one another, often imperceptibly. Crucially, the meaning of an individual lexical item 
considered in isolation is heavily underspecified. As has been convincingly argued by 
linguists (Allwood 1999, 2003) and lexicographers (Hanks 2000), rather than having a 
number of more or less fixed meanings, words are better thought of as having certain 
meaning potentials. These only get actualized—through being constructed by speaker 
and hearer, or writer and reader—in particular situations of language use.° 

The lexicographer’s task is to analyse many typical instances of language use (as 
recorded in a language corpus) from which actual meanings emerge, and to formulate 
generalizations on the basis of the analysed instances which can later be presented in 
a dictionary. An essential part of the job consists in deciding which potential mean- 
ings of a particular headword are to be included in its dictionary entry. In the light of 
what has been said above, it must be remembered that those meanings (called senses in 
lexicography) are abstractions from real-life language data—in other words, they are 
lexicographic artefacts. In the bilingual context, the construction of dictionary senses 


+ TL definitions instead of equivalents are more commonly found in scholarly dictionaries whose 
source language is either no longer spoken (e.g. Sanskrit) or very exotic from the point of view of 
the intended users. Compare Ashdowne, this volume. Another case are specialist dictionaries, for 
instance, of slang, where giving an exact TL counterpart for the headword is impossible in most 
entries. 

5 More precisely, according to Allwood (2003: 43), the meaning potential ofa word is ‘all the 
information that the word has been used to convey either by a single individual or, on the social 
level, by the language community. . . . itis the union of generally and collectively remembered 
uses. ... Meaning potentials are activated through various cognitive operations. Some of these are 
triggered through language use; others can be activated independently of language. 
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is additionally complicated by the fact that the lexicographer is dealing with two lan- 
guages at once, looking at one through the prism of the other, 


9.3 ANISOMORPHISM 


Dividing a bilingual dictionary entry into senses is intimately connected with the task 
of providing target language equivalents (see Fontenelle, this volume). What makes the 
search for those equivalents challenging—and what made Séerba look for an alterna- 
tive to the tradition-sanctioned translation dictionary—is the fact that different lan- 
guages are structured differently. Differences can be observed at all levels of language 
organization, the lexicon—the primary object of dictionary description—being no 
exception. The essential incompatibility of the lexical systems of different languages 
has long been commented upon by philosophers, linguists, translators, and dictionary 
makers. In metalexicography, it has come to be referred to as anisomorphism, a term 
coined by the Czech linguist and lexicographer Ladislav Zgusta.® 

That languages are not isomorphic does not, of course, mean that no lexical item of 
one language (in one of its senses) can be matched with a lexical item of another. What 
it means is that such pairings are not possible for every single item in every single one of 
its senses. In general, the more distant—genetically, typologically, culturally—the two 
languages in question, the greater the number of gaps that we can expect in the match- 
ing procedure. Such gaps can be of two types: referential or lexical (see, e.g., Svensén 
2009: 273). In the former case, the lack of an equivalent is due to the lack of a referent: a 
particular object, phenomenon, custom, etc. does not exist in the TL culture and, as a 
result, there is no word for it in the target language. In the latter case, although a given 
referent may be present in the TL culture, or a particular idea familiar to its members, 
there is nonetheless no established name for it; linguists say that the concept has not 
been lexicalized in the target language. 

A referential or lexical gap in the target language is, naturally, no excuse for failing 
to provide any information about the SL headword’s meaning in the dictionary or for 
omitting the troublesome headword altogether. Not only must an alternative solution 


® Anisomorphism is simply a shorter version of lack of isomorphism, where isomorphism is 
understood as one-to-one correspondence between the elements of two systems. Zgusta (1971: 294 ff.) 
introduced the term thus: ‘The basic purpose of a bilingual dictionary is to coordinate with the 
lexical items of one language those lexical units of another language which are equivalent to their 
lexical meaning. .. . The fundamental difficulty of such a coordination of lexical units is caused by 
the anisomorphism of languages, that is, by the differences in the organization of designate in the 
individual languages, .. . It would be a mistake to think that this can happen only if the two cultures 
are vastly different. .. . It would be another mistake to think that it is only the difference in the 
material extralinguistic world, the absence of the denotatum which is of basic importance....Evenin 
those areas where the two cultures overlap and where the material extralinguistic world is identical, 
the lexical units of the two languages are not different labels appended to identical notions. In the 
over whelming majority of cases, the designata are differently organized in the two languages.’ 
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be found for each such case, but it must also be conveyed to the dictionary user, whether 
explicitly or implicitly, that the status of lexicographic equivalents is not always equally 
secure—in particular, that not every equivalent suggested by the dictionary will be 
an equally suitable translation of a particular headword. This may necessitate supple- 
menting lexicographic equivalents by additional information, a process which often 
requires considerable ingenuity on the part of the lexicographer. We shall examine 
some instances of this in the following sections. 


9.4 MAJOR SOLUTIONS 
TO THE PROBLEM OF ANISOMORPHISM 


9.4.1 Cognitive and Translational Equivalence 


Assuming that a perfect equivalent for an SL headword could be found in the target 
language, what exactly would it look like? For a start, it should correspond to the SL 
item in both its denotational and connotational meaning. Denotational identity means 
that the headword and the equivalent should have the same extension (i.e. be applicable 
to the same class of entities or phenomena in the world) as well as the same intension 
(i.e. share a defining property or properties). Connotational identity means, among 
other things, that they should evoke the same associations, be a perfect fit in terms of 
style and register, and express the same attitude on the part of the speaker or writer.’ 

To take a simple example, the extension of the word grandmother is the set of all 
grandmothers, while its intension is what all grandmothers have in common: being the 
mother of somebody’s parent. It might seem that achieving denotational identity in a 
case such as this should pose no problems. However, in Swedish, there is no one perfect 
equivalent of grandmother, but two partial ones, each having a smaller extension and a 
richer intension than the English word: farmor ‘father’s mother’ and mormor ‘mother's 
mother’. 

In most languages, of course, grandmother will have a denotationally identical 
equivalent, but the connotations may still differ (more so in the case of culturally dis- 
tant languages), if only because of the differences in status accorded to grandmothers 
in different societies. And that is not all. Apart from being denotationally and connota- 
tionally identical to the SL item, a perfect TL equivalent should have the same range of 
application, that is, act as a natural-sounding translation thereof in every possible con- 
text. This is not the case in Polish, which, to take care of the contexts covered by English 
grandmother, makes use of two related words, babka and babcia. The differences 
between them are partly connotational (and have to do with the fact that, historically, 


7 The terms denotation and connotation are used differently by different scholars (see, e.g., Cruse 
2006: 45). The perspective adopted here follows Lyons (1977: ch. 7). 
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babcia is a diminutive of babka), but both can also occur in a perfectly neutral register 
(e.g. in expressions such as ‘our grandmothers’ generation’, ‘in the times of our grand- 
mothers’, etc.). In most situations, babcia is the preferred option: it is the only possibil- 
ity as a form of address as well as in the phrase Dzien Babci ‘Grandmother's Day’, and 
is generally favoured when talking or writing about one’s own grandmother. Babka is 
found mostly in official documents, genealogies, and the like; those who still use it in 
less formal situations are more likely to do so with reference to a distant ancestor or to 
somebody else’s grandmother, especially a person they have never met. 

If there can be no perfect interlingual equivalence for such a basic lexical item as 
grandmother, it stands to reason that less common and/or semantically more complex 
headwords will give rise to more asymmetries. Perfect equivalence, understood as 
one-to-one correspondence between lexical items of two languages, is no more than 
an ideal. Bilingual lexicographers set themselves more modest, but attainable goals: the 
kind of equivalence they operate with is not absolute, but rather a matter of degree 
along all the above-mentioned parameters. 

The term equivalence suggests two (or more) items being of equal value. But equal 
value relative to what? When dealing with the lexicon, it helps to distinguish cognitive 
equivalence, which involves the SL and TL items having equal value in their respective 
language systems, from translational equivalence, where equal value is manifested by 
the two items when embedded, respectively, in a concrete SL text and its TL translation. 
The former is of more relevance to linguistics, the latter to the theory and practice of 
translation—in bilingual lexicog raphy, we need them both. 

Being situated at the level of the language system, a cognitive equivalent is fairly gen- 
eral and, as such, appropriate as a translation of the SL item in many, though not all, 
contexts. Even when not insertable in a given context, it may still elicit in the dictionary 
user a translational equivalent appropriate on a particular occasion. This is possible 
thanks to the cognitive equivalent’s explanatory potential, that is, its ability to faith- 
fully represent the meaning of the SL item. Often, a cognitive equivalent is what springs 
to the bilingual speaker’s mind more or less immediately after they have been pre- 
sented with the SL item. Thanks to this obviousness and immediacy, cognitive equiva- 
lents tend to be identical in different dictionaries. For instance, Italian respirare will, 
in all probability, be accompanied by the English equivalent breathe, German immer 
by English always, and Spanish /leno by English full, no matter which dictionary for 
the language pair in question we choose to consult. Additionally, cognitive equivalence 
appears to be a symmetrical relation: respirare = breathe in Italian-English dictionar- 
ies, breathe = respirare in English-Italian ones, and so on. It should therefore come as 
no surprise that a cognitive equivalent, whenever available, is the bilingual lexicogra- 
pher’s first choice. 

By contrast, a translational equivalent is situated at the level of the (written or spo- 
ken) text: it produces a good translation when substituted for the SL item ina particular 


5 The following typology of equivalence, inspired by Zgusta (1971) and (1987), has been developed in 
anumber of earlier publications; for details, see Adamska-Sataciak (2006, 2010, 2011). 
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context without being wholly identical to it in meaning. A translational equivalent (or 
a selection of translational equivalents) is what the lexicographer has to settle for when 
a cognitive equivalent cannot be found. Because of the lack of an upper limit on the 
number of contexts in which a given SL lemma may occur, the number of potential 
translational equivalents may be quite high. Obviously, a bilingual dictionary can only 
give a few such equivalents per sense, making sure that they are usable in the most typi- 
cal contexts in which the SL item can be found. Given their primary function, transla- 
tional equivalents are especially welcome in dictionaries, or dictionary parts, aimed at 
assisting in the user’s foreign language production, that is, those going from L1 to L2. 

Take the Portuguese noun saudade, which expresses a complex concept not lexi- 
calized in English (there is a lexical gap in English in the place where Portuguese 
has saudade). As a result, saudade is rendered in an online Portuguese—English dic- 
tionary (<http://dictionary.reverso.net/portuguese-english/saudade>) by a string of 
near-synonymous translational equivalents.” 


saudade (desejo ardente) longing, yearning, (lembranca nostdlgica) nostalgia” 


Even taken together, the meanings associated with longing, yearning, and nostalgia do 
not exhaust the meaning of saudade, but they certainly doa better job of it than any one 
of the three would on its own. At the same time, the dictionary user is given a choice of 
candidates for translating the Portuguese noun in context. 

Dictionaries offer translational equivalents not only for lexical items expressing such 
evidently culture-specific concepts as saudade, but also for more common ones. For 
example, English graceful in the sense ‘elegant in form, proportions, movement, expres- 
sion, or action’ (OED) is rendered in the Concise Oxford Spanish Dictionary (COSD) 
as: lleno de gracia, gracil; elegante; in the New Kosciuszko Foundation English-Polish 
Dictionary (NKFD) as: peten wdzieku, wdzigczny; elegancki. What the dictionaries 
tell us is that, in some contexts, the best TL translation of graceful will be an adjec- 
tive which literally means ‘elegant’ (Sp. elegante, Pol. elegancki), in others, an adjectival 
phrase meaning ‘full of grace’ (Sp. lleno de gracia, Pol. peten wdzieku), etc. 


9.4.2, Explanatory and Functional Equivalence 


Cognitive and translational equivalents form the core ofany bilingual dictionary. Apart 
from those, two more varieties of lexicographic equivalence can be distinguished: the 


° Throughout this chapter, only those parts of dictionary entries are quoted which illustrate the 
point being discussed. 

10 Although the presence in the entry of sense discriminators (desejo ardente and lembranca 
nostdlgica) might be taken to imply that Portuguese saudade is polysemous, the sense split is, in fact, 
dictated by the non-existence of a single English equivalent. Such TL-based sense structure of the 
entry is quite common in bilingual dictionaries. 
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explanatory and the functional. These come to the rescue when the lexicographer is not 
able to come up with an established lexical unit of the target language corresponding 
to the SL item, not even one suitable only in a particular, narrowly delimited context. 

An explanatory equivalent is not an established TL unit, but a free combination. In 
terms of content, it is a succinct TL paraphrase of the SL item. If conveniently short and 
skilfully worded, it can sometimes work as a translation of the headword in context. 
When it does not—which is on most occasions—it may nonetheless help the dictionary 
user find a contextually appropriate translation of the SL item. 

The following examples of explanatory equivalents, taken from the English-French 
section of the Collins Robert French Dictionary, sixth edition (CRFD6), are occasioned 
by lexical gaps in French: 


spay enlever les ovaires de 
squelch faire un bruit de suction 


stride marcher a grands pas or a grandes enjambées 


The activities denoted by English spay, squelch, and stride are clearly not 
culture-dependent. However, it so happens that in none of the cases does French have 
a corresponding verb of the same degree of specificity: spay is used only when talking 
about sterilizing female animals; onomatopoeic squelch imitates a very specific sound; 
stride denotes a particular manner of walking." 

As an example of an explanatory equivalent motivated by a referential gap, consider 
the treatment of the Portuguese noun fado (in the musical sense) by a Portuguese- 
English dictionary (<http://dictionary.reverso.net/portuguese-english/fado>), and of 
the English phrase morris dancing by CRFD6: 


fado traditional song of Portugal 


morris dancing danse folklorique anglaise’ 


It will be clear from these examples that what is here called an explanatory equiva- 
lent, whether resulting from a lexical or a referential gap, resembles what Séerba saw 
as the principal strategy of meaning explanation in an L2-Li dictionary. However, as 
already mentioned, an explanatory equivalent must be maximally succinct, so that it 
could, in principle, substitute for the headword in context. Put differently, TL explana- 
tory equivalents can be thought of as a subset of TL definitions of SL items. The fol- 
lowing examples—respectively, from CRFD6, COSD, and the Collins Concise Spanish 


" In French, manner of motion is normally not coded by the verb, but expressed periphrastically by 
an adverbial of manner. For a theoretical account, see, ¢.g., Slobin (2006). 

" Dictionaries often use italics for the presentation of what in our classification are explanatory 
equivalents motivated by a referential gap, thereby distinguishing them from ‘proper’ equivalents, that 
is, established lexical units of the target language. 
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Dictionary (CCSD)—are all TL definitions as envisaged by Scerba, but not explanatory 
equivalents according to our typology: 


mews flat petit appartement aménagé dans une ancienne écurie, remise etc. 
hit-and-run accidente en el que el conductor atropella a algn y se da ala fuga 


stalker persona que estd obsesionada con otra y la acosa constantamente con llamadas 
telefonicas o siguiéndola a todas partes 


As indicated at the beginning of this section, a perfect TL equivalent should be both a 
faithful representation of the SL item’s meaning and a good translation thereof in con- 
text—or, as Zgusta (1987: 238) put it, it should combine explanatory power with insert- 
ability. A cognitive equivalent does very well in the former department and moderately 
well in the latter. A translational equivalent first of all fulfils the insertability require- 
ment and only secondarily acts as a meaning indicator. An explanatory equivalent 
is rarely insertable, its primary function being to clarify the SL meaning. Providing 
explanatory equivalents thus makes sense mainly in dictionaries, or parts thereof, 
aimed at foreign language reception, thatis, in L2-Li ones. 

From what has been said so far, it could be inferred that a translational equivalent is 
always preferable to an explanatory one. Broadly true as this is, there are exceptions. 
Sometimes an explanatory equivalent, though less readily insertable than a transla- 
tional one, is still worthy of inclusion, owing to its greater semantic precision. Thus, 
although the most common Polish translations of the English adjective challenging are 
ambitny ‘ambitious’ and wymagajgcy ‘demanding’, large English-Polish dictionaries 
tend to give an explanatory equivalent as well: in NKFD, stanowigcy wyzwanie ‘consti- 
tuting a challenge’ is in fact listed first, before the translational equivalent ambitny. 

Another qualification that has to be made is that the boundary between transla- 
tional and explanatory equivalents is not sharp. In the discussion of English graceful, 
we encountered the Spanish equivalent lleno de gracia and the Polish pefen wdzieku, 
both literally meaning ‘full of grace’. These are difficult to classify: they could be 
treated as explanatory by virtue of the fact that they spell out the SL word’s meaning, 
but they could also be considered translational, because they can be used to produce 
natural-sounding translations of English graceful in context. The decisive criterion, of 
course, is whether or not they are established lexical units of the target language. But 
this is precisely where the difficulty lies: on the one hand, they appear to be free com- 
binations, since in constructions meaning ‘full of x’, x could, in principle, stand for 
anything; on the other hand, Polish peten wdzieku is such a frequently occurring com- 
bination that it could well be treated as fully established. As we shall see later, it is not 
only the boundary between these two types of equivalence that is to some extent fluid. 

Our last type, functional equivalence, is the most distinct of the four: it is a relation 
which holds, not between the meanings of individual lexical items, but between the 
meanings of longer stretches of text. We can talk about functional equivalence when 
the TL text portion contains either a TL word ofa different grammatical category than 
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the SL headword to which it is meant to correspond or no element whatsoever corre- 
sponding directly to the headword. 

Traditionally privileged in translatology, functional equivalence was, until recently, 
largely ignored by lexicographers. This is now changing rapidly, thanks in no small 
measure to the availability of large electronic corpora. Aided by acorpus, the lexicogra- 
pher can either use one of the contextual occurrences of the SL word contained therein 
or construct a corpus-inspired phrase or sentence translatable into the target language 
without requiring a lexical equivalent for the SL word itself. Functional equivalence 
is thereby achieved between the SL phrase or sentence and its TL translation, as in the 
treatment of the intransitive use of the English verb scrabble in CRFD6: 


scrabble a (also scrabble about, scrabble around) to scrabble in the ground for sth 
gratter la terre pour trouver qch ¢ she scrabbled (about or around) in the sand for the 
keys she had dropped elle cherchait a tatons dans le sable les clés quelle avait laissé 
tomber ¢ he scrabbled (about or around) for a pen in the drawer il a tatonné dans le 
tiroir a la recherche d’un stylo b (= scramble) ¢ to scrabble to do sth chercher a faire qch 
au plus vite his mind scrabbled for alternatives il se creusait la téte pour trouver au 
plus vite d’autres solutions 


Sometimes, short of giving a discursive, metalinguistic explanation of a particular 
headword’s behaviour, aiming for functional equivalence is the only option avail- 
able to the bilingual lexicographer. This happens when the SL item is a member of a 
grammatical category non-existent in the target language, such as Zulu ideophones,” 
a word class without a counterpart in English. The following entry (quoted after De 
Schryver 2009: 40) demonstrates how the problem has been solved by the Oxford 
IsiZulu—IsiNgisi/English-Zulu Isichazamazwi Sesikole/School Dictionary (OZESD): 


ngqi ideophone 1 ® (of tightness, of security, of holding firm) ¢ Wase ezivalela yena end- 
lini ethi ngqi. « He then locked himself securely inside the house. ¢ Yalibamba iqakala. 
Yalibamba yalithi nje nggi. « It grabbed his ankle. It grabbed tt very Hgfitly 

ting stuck) #@ UVika wayengasakwazi nakunyakaza. Nengqondo vayuithe nje Nggi. 
« Vika could no longer move. His mind too got stuck. 


No decontextualized English equivalents of the headword are offered here. Instead, 
the sense discriminators—(of tightness, of security, of holding firm) in 1 and (of getting 
stuck) in 2—are followed directly by examples of use and their English translations. 
Equivalence thus obtains primarily between whole sentences in Zulu and English, and 


3 An ideophoneis ‘a word, often onomatopoeic, which describes a predicate, qualificative or 
adverb in respect to manner, colour, sound, smell, action, state or intensity’ (Doke 1935; quoted after 
De Schryver 2009: 35). The functions of this part of speech in Bantu languages resemble, but are not 
identical to, those of the adverb in English. 
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only to a limited extent between Zulu ngqi and whatever English phrase (shaded here 
for ease of reference) corresponds to it in a particular translation. 


9.4.3 Cultural Counterparts and Lexical Innovations 


Allthe equivalence types except the cognitive are, in a sense, repair strategies for deal- 
ing with cases when a default, cognitive equivalent is unattainable. Even so, there still 
remain situations when neither providing a series of translational equivalents, nor sup- 
plying an explanatory equivalent, nor even extending the context to obtain functional 
equivalence between larger units, produces a satisfactory result. In such cases, one of 
two auxiliary strategies may prove helpful: suggesting a cultural counterpart or pro- 
posing a lexical innovation. 

Suggesting a cultural counterpart is one of the solutions employed to bridge a refer- 
ential gap. It consists in pointing to an institution, government body, etc., that plays a 
roughly similar role in the TL speech community to that played by the referent of the 
headword in the SL culture. A correspondence is postulated here which obtains pri- 
marily between real-world phenomena and only secondarily between their names. 
There is some denotational similarity, but no identity: the names in the two languages 
are not names of the same object or phenomenon, but merely of functionally similar 
objects or phenomena. For example, the British institution called the Royal Society and 
the German exam called Abitur have been matched with the following cultural coun- 
terparts (respectively, English and French) in CRFD6 and the Collins German College 
Dictionary (CGCD): 


the Roya! Society = Académie des sciences 
Abitur ~ A levels! 


Suggesting a cultural counterpart is an obvious alternative to giving an explanatory 
equivalent or a TL definition. The strategies can also be combined, as in the entry for 
Abitur in CGCD, which, in fact, looks like this: 


Abitur German school-leaving examination, = A levels (BRIT) 


This solution takes up more space, but users may prefer it to being offered either an 
explanation or a cultural counterpart on its own. Here, it is additionally motivated by 
the fact that the cultural counterpart functions in only one milieu where the target lan- 
guage is spoken (Britain) rather than in the whole of the English-speaking world. 


4 This is why we prefer to talk about cultural counterparts rather than cultural equivalents. 
Elsewhere (e.g. Svensén 2009: 274; CRFD6: xxii), the term nearest cultural equivalent is used. 

'S The sign ~ is employed by some dictionaries to signal the approximate nature of the 
correspondence proposed. 
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Very occasionally, when none of the more common solutions appears viable, the 
lexicographer may choose to propose as an equivalent a word that is not—or, at least, 
not yet—established in the language. This amounts to sanctioning an innovation in 
the form of a borrowing from the source language. The borrowing in question must 
already have some footing in the target language, otherwise the chances of an innova- 
tion becoming accepted merely on the strength of it featuring in a bilingual dictionary 
would be very slim indeed. 

Linguists distinguish between two kinds of borrowing: lexical, when both the form 
and the meaning of a foreign word are transplanted into the recipient language, and 
semantic, when a foreign meaning gets associated with an existing TL word. The fol- 
lowing equivalents of Russian words, given by the Concise Oxford Russian Dictionary 
(CORD), are instances of lexical borrowing from Russian into English: 


kyae6aka kulebyaka 
kauta kasha 


kattoia Katyusha’® 


Semantic borrowing usually takes place when the TL word to which a new meaning 
gets attached has already been functioning as an equivalent of the SL word in another 
sense. A familiar case are the words for ‘mouse’ (‘a smal] rodent’) which, under the 
influence of English mouse, have in many languages developed the extra sense of ‘a 
small mobile manual device that controls movement of the cursor and selection of 
functions on a computer display’ (<http://www.merriam-webster.com/dictionary/ 
mouse>). 

Occasionally, semantic borrowing may occur between words which, to start with, 
are not equivalent in any of their senses, but merely resemble each other formally, 
thus constituting a pair of false friends. This happened recently with the Polish word 
klisza, which once meant only ‘photographic film’ or ‘printing plate’, but now, under 
the influence of English cliché, is also used, especially in book and film reviews, in 
the sense of ‘a hackneyed theme, image, or situation’. When, in the early 2000s, lexi- 
cographers compiling the English-Polish volume of the NKFD decided to include 
klisza among the equivalents of English cliché, it was a case of sanctioning an inno- 
vation, as the new sense of klisza had not yet been recorded by monolingual diction- 
aries of Polish. 

The process of semantic borrowing may also result in a loan-translation (calque). 
This happens when the components of a formally complex SL item (usually a com- 
pound) are literally translated into their TL equivalents, producing a new lexical item 
whose overall meaning is identical to that of the SL model. Some of the numerous 
calques from German into English have no doubt been helped on their way by the 


16 We shall have more to say about these in Section 9.5. 
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practice of German-English dictionaries which, by acknowledging them as pos- 
sible equivalents, boosted their status as legitimate lexical items of English. Here is 
an example (from CGCD) of a loan-translation which the lexicographer must have 
considered innovative, given that they have additionally provided an explanatory 
equivalent: 


Geisterfahrer ghost-driver (US), person driving in the wrong direction” 


As already noted, the boundaries between various strategies for overcoming aniso- 
morphism are not sharp. A cognitive equivalent can sometimes act as a translation of 
the SL headword (thus instantiating two equivalence types at once), while an explana- 
tory equivalent may occasionally be difficult to distinguish from a translational one 
and, on other occasions, from a definition in the target language. Consequently, our 
four equivalence types are best regarded as convenient reference points along an 
equivalence continuum rather than as mutually exclusive categories with sharply 
delineated boundaries. The rationale for distinguishing them in the first place is that 
having such points of reference makes it easier to analyse the contents of bilingual 
dictionaries. 

It also needs to be emphasized that the choice of a lexicographic solution for a par- 
ticular case of anisomorphism is not dictated solely by the properties of the lexical sys- 
tems of the languages dealt with. For instance, whether the lexicographer decides to 
suggest an incipient borrowing as an equivalent candidate, or whether they settle for 
the safer option of providing an explanatory equivalent, will depend on the diction- 
ary’s target audience and on the purpose for which the entry is most likely to be con- 
sulted. Both fado and saudade feature as headwords in the monolingual Oxford English 
Dictionary (OED), so it might be tempting to conclude that the process of their being 
borrowed into English has already been completed. But Portuguese-English diction- 
aries do not routinely give fado and saudade as English equivalents of the Portuguese 
words in question. This is because, as has been stressed repeatedly, the primary task of 
a bilingual L2-Li dictionary is to explain La meanings. An English speaker looking 
up fado in a Portuguese-English dictionary will, in all probability, want to know what 
the word means, and may therefore feel short-changed when informed merely that its 
English equivalent is fado. This is not to say that, in similar circumstances, a loanword 
must never be proposed as an equivalent—merely that, when it is, it should be accom- 
panied by some additional information. How this information can be presented is the 
subject of the next section. 


That ghost-driver is not fully at home in English can also be gleaned from the fact that it is not 
given as an equivalent of Geisterfahrer in other German-English dictionaries, including those larger 
than CGCD, such as the Oxford German Dictionary (OGD). 
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9.5 SUPPLEMENTARY 
MEANING-ELUCIDATING STRATEGIES 


When equivalent provision alone does not fully succeed in rendering an SL meaning, 
the bilingual lexicographer can supplement the proposed equivalent(s) with additional 
semantic information. Such information frequently takes the form ofa gloss (following 
the equivalent, enclosed in parentheses, usually italicized), a usage label (placed before 
the equivalent, in parentheses, often abbreviated), or an explanatory note (situated out- 
side the entry proper, usually in a box immediately below it). It may also be conveyed 
implicitly, by a well-chosen example of usage. 

A gloss is a comment on the headword, on the equivalent, or on both. Let us look at 
the three Russian-English entries from CORD again, this time reproducing also the 
glosses and labels which accompany the proposed equivalents: 


kyne6aka kulebyaka (pie) 
Kata kasha (dish of cooked grain or groats) 


KaTioma (mil.; coll.) Katyusha (lorry-mounted multiple rocket launcher) 


The glosses explain the meaning of the Russian headwords and, at the same time, of 
the English equivalents, which are lexical borrowings from Russian not universally 
known among speakers of English. The contents of the glosses exhibit different degrees 
of specificity: from a mere provision of a hyperonym (pie) to a fully-fledged definition 
(lorry-mounted multiple rocket launcher). A gloss can also contain a synonym, a hypo- 
nym, a typical collocate, an indication of the relevant semantic field—anything that 
might help the user grasp the SL meaning. It is thus not so much the contents which 
make a gloss, but the function, the placement (after the equivalent, in parentheses), and 
the typography (italics). 

Considered from the functional point of view, glosses can be either explanatory, 
like the ones we have just looked at, or disambiguating. Disambiguation is called for 
whenever an equivalent is polysemous and therefore potentially ambiguous out of 
context. A disambiguating gloss helps to uniquely identify the sense in which the TL 
item is to be interpreted. The following entry from the Longman Stownik Wspétczesny 
Angielsko-Polski, Polsko-Angielski (LSW), abilingual pedagogical dictionary for Polish 
learners of English, illustrates this use of glosses: 


choppy wzburzony (0 wodzie) 
Apart from ‘choppy’ or ‘rough’, Polish wzburzony can also mean ‘agitated’ or ‘per- 


turbed’. The addition of the gloss o wodzie ‘of water’ ensures that the equivalent is inter- 
preted correctly, its other possible senses (meaning potentials) remaining inactive. 
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In order to show how labels contribute to the presentation of the headword’s 
semantics, let us return once more to the kamu entry. It contains two labels: mil. 
(short for military) and coll. (for colloquial), the former of which can be classified as 
encyclopaedic, the latter as metalinguistic. Encyclopaedic (also known as domain, 
field, or thematic) labels—for example technical, law, music, culinary-—-specify the 
subject field in which the headword (in a particular sense) is used. Metalinguistic 
labels—for example informal, literary, slang, Irish—indicate the register, style, or 
language variety to which the headword (or its sense) belongs. Thus, in addition to 
narrowing down the headword’s meaning, labels can also discriminate between 
different senses of a polysemous headword. Since the use of labels is in principle 
identical in bilingual and monolingual dictionaries, no further illustrations seem 
necessary at this point. One thing that should perhaps be added is that not all labels 
are equally indispensable: in the xamzowa entry, coll. is crucial, while mil. merely 
duplicates information conveyed by the explanatory gloss. Such redundancy is not 
necessarily a bad thing, as it increases the chances of the user noticing and absorbing 
what they are being told. 

Like glosses and labels, notes can also perform manifold functions. They arbitrate 
in questions of usage, for instance, by helping to distinguish between near-synonyms, 
by highlighting the headword’s irregular grammatical behaviour, or by warning the 
language learner against mistaking false friends for bona fide equivalents. The fol- 
lowing notes from CRFD6, placed, respectively, at the entry for fastidieux in the 
French-English part, and at the entry for fastidious in the English-French part of the 
dictionary, contain such false-friend alerts: 


fastidieux ne se traduit pas par fastidious, qui a le sens de ‘méticuleux’ 


fastidieux does not mean fastidious, but ‘boring’ 


A note may also further elucidate a SL meaning, for instance, by showcasing conno- 
tational differences between the headword and the proposed equivalent(s), as in this 
fragment of the note at banlieue (in CRFD6), which was evidently motivated by the 
lexicographer’s conviction that the English equivalents given in the entry (suburbs, 
outskirts) might be misleading without proper qualification: 


Banlieue 


The connotations of suburbia in France are quite different from those that prevail in 
many English-speaking countries. For historical, economic and social reasons, many 
suburbs of large French towns have become severely depressed in recent years; the 
word banlieue thus tends to conjure up images of violence and urban decay, and has 
similar connotations to the English term “inner city”. 


Some notes are purely encyclopaedic in character, supplying information about the 
(usually culture-specific) real-world referent of the SL item, as in this passage about 
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morris dancing, which follows the entry for morris (dance) in the English-French sec- 
tion of CRFD6: 


Morris Dancing 


Le Morris dancing est une danse folklorique anglaise traditionellement réservée aux 
hommes. Habillés tout en blanc et portant des clochettes, ils exécutent différentes 
figures avec des mouchoirs et de longs batons. Cette danse est trés populaire dans les 
fétes de village. 


Sometimes a note is all there is, the dictionary carrying no separate entry or sub-entry 
for a particular SL item. The note for the American Dream reproduced below comes 
from COSD, where it follows the entries for American (adjective and noun), neither of 
which features the phrase in question (and thus the translational equivalent el suefio 
americano appears only inside the note): 


the American Dream 


En Estados Unidos, el suefio americano es la creencia, que cualquier persona que tra- 
baje duro puede alcanzar el éxito econdmico o social. Para los inmigrantes y las mino- 
rias, este sueno también incluye libertad e igualdad de derechos. 


Like glosses and labels (see xamtouia above), notes and labels can also be combined for 
optimum effect, as in this fragment of a CRFD6 entry, which offers both a usage label, 
indicating the pejorative character of a particular sense of the headword, and a note 
elaborating on its etymology and connotations: 


beauf.. .b) (péj) narrow-minded Frenchman with conservative attitudes and tastes 
Beauf 


The word beauf is an abbreviation of «beau-frére» (brother-in-law). It is a pejorative 
and humorous term used to refer to stereotypical ordinary Frenchmen who are per- 
ceived as being somewhat vulgar, narrow-minded, and chauvinistic. 


All the lexicographic devices discussed so far present semantic information in a more 
or less explicit manner. In examples of usage, by contrast, information on meaning 
is normally conveyed implicitly, the main exception being example sentences which 
contain a definition of the word being illustrated. Although examples of usage are not 
a Standard feature of bilingual lexicography, more and more bilingual dictionaries, 
especially pedagogical ones, take advantage of them these days. Most examples sup- 
port information given earlier in the entry, but some expand on it or qualify it, usually 
by introducing important exceptions, for example by showing that in certain circum- 
stances a different translational equivalent is needed, or that the headword is some- 
times omitted in translation. 
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9.6 CONCLUSION 


Bilingual dictionaries have arisen in response to people’s need for establishing mean- 
ing correspondences between languages. It is only after a dictionary user has identified 
such provisional correspondences that they can proceed with translating, producing 
their own utterances in the foreign language, or any other activity which prompted 
them to consult a bilingual dictionary in the first place. Depending on the headword, 
semantic information in the dictionary can be conveyed in different ways. The most 
important meaning-elucidating strategy is the provision of lexicographic equivalents, 
which, as argued above, form an internally diversified category. In order to ensure that 
the explanation of SL meanings is successful, equivalents are routinely complemented 
by glosses, labels, notes, and, increasingly often, examples of usage. 

There are a number of issues we have not touched upon here. For some words (e.g. 
certain concrete nouns or prepositions in their primary spatial senses), presenting 
meaning ostensively—via pictures or photos—can be effective. For lexical items whose 
meaning is mainly pragmatic (e.g. good night, sorry, cheers), good translational equiva- 
lents are needed or, in their absence, a metalinguistic explanation of the function of a 
particular expression. Finally, there is the diversified class of multi-word units, Often, 
the same strategies are applicable here as in the case of single-word units.'* However, 
especially with idioms, the question appears whether the SL meaning is best rendered 
bya TL unit which is also an idiom or whether a TL paraphrase should be used. The 
latter solution seems safer in most cases, because idioms of exactly the same meaning 
are extremely rare. A comprehensive discussion of this particular issue can be found in 
Dobrovol’skij and Piirainen (2005). 


1B As argued by Zgusta (1971: 157), ‘[t]he lexicographer is concerned with... those units of 
language... that have unified and distinctive lexical meanings of their own, irrespective of their 
formal structure or of the number or nature of their own constituent parts’. 
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In the treatment of words the historical principle will be uniformly 
adopted;—that is, we shall endeavour to show more clearly and fully than 
has hitherto been done, or even attempted, the development of the sense 
or various senses of each word from its etymology and from each other, so 
as to bring into clear light the common thread which unites all together. 
(Philological Society 1859, 4.) 


10.1 INTRODUCTION 


THE ‘historical principle’ as understood by the authors of the Philological Society's 
Proposal for the Publication of a New English Dictionary was clearly enough defined 
that it could be named before being explained, but new enough that it needed explana- 
tion. By 1884, the first fascicle of the Society's dictionary was published as A New English 
Dictionary upon Historical Principles (NED: now the Oxford English Dictionary, OED), 
and these principles shaped every entry: at its heart was a paragraph of dated quotations 
arranged in chronological order, the first being the earliest instance of the word which 
had come to the editors’ hands, and the last being either the latest available instance of 
an obsolete word or an example of its recent or current usage. When an entry had mul- 
tiple senses, these were likewise ordered chronologically, except in cases where a depar- 
ture from strict chronological order was thought to bring out the sense-development of 
the word. The set phrase historical principle(s) was not defined in the dictionary’s entry 
historical. In 2012, the newly-published revision of that entry in OED Online still did 
not give special attention to historical principle(s), or indeed to historical dictionary or 
historical lexicography: these phrases were evidently felt to be transparent, unlike histori- 
cal grammar and historical linguist, which were given their own definitions in section $3 
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of the entry. Historical lexicography does appear in a quotation of 1986 from the journal 
Dictionaries under sense 2c, the definition of which begins ‘Ofan area of study: based on 
an analysis of development over a period of time; in connection with history; from the 
point of view of a historian’; the full text of the quotation is ‘Such reverification ... has 
been a hallmark of Oxford historical lexicography; and its brevity suggests the perceived 
transparency of the form which it illustrates. 

Historical dictionaries are indeed dictionaries based on an analysis, or on a great num- 
ber of separate analyses, of development over a period of time. But the reader of the entry 
historical as revised in 2012 who turns to the corresponding entry in the second edition 
of the OED, itself lightly revised from the entry published in a fascicle of 1898, will at 
once see that the techniques and principles of historical lexicography have changed even 
as applied to successive editions of the same dictionary. Not only is more evidence for 
the use of historical handled in the revised entry, and with a greater richness of seman- 
tic discrimination, but there is a dramatic revision of the thought behind the entry, no 
less dramatic for being understated. So on the one hand two earlier quotations are pro- 
vided, both from the fifteenth-century translation of Ranulf Higden’s Polychronicon in 
British Library MS Harleian 2261. These stand at the head of two different senses. The 
Latin originals (the adjective historicus ‘historical’ and the noun historicus ‘historian’) 
are given in both cases—and this is done to show that the English word was polysemous 
from its first use, and that its polysemy is formed on developments which had taken place 
in the corresponding word family in Latin. Likewise, in section S3 of the entry, a number 
of the special uses of historical are shown to respond or relate to developments in other 
European languages: historical faith (1531) is after post-classical Latin fides historica (1519); 
historical geology (1823) is after French géologie historique (1823, in a work by Alexander 
von Humboldt, written in French and translated immediately into English); historical lin- 
guist (1876) is after Russian istorik-lingvist (1873); historical materialism (1892, in a work 
by Engels) is comparable with German historischer Materialismus (1893). Whereas the 
historical principles understood in the foundational document of 1859 were expected ‘to 
bring into clear light the common thread which unites all together’ in an entry telling the 
organic story of an English word, those informing the entry of 2012 tell part of the story of 
a developing group of words in multiple European languages. 

The changes in technique and principle evident even in this incomplete account of 
the revision of one OED entry are part of a larger, and ongoing, story of lexicographers’ 
engagement with the development of vocabulary over time. This chapter will present a 
history of this engagement, with particular attention to the origins of the historical prin- 
ciples invoked by nineteenth-century and subsequent lexicographers, and will turn at 
its end to some current issues in historical lexicography—not least the one at which we 
have just glanced, namely the handling of the results in one language of developments in 
which multiple languages partook concurrently.! 


1 Tam grateful to Henri Béjoint, Philip Durkin, Susan Rennie, and Christopher Stray for their 
comments on this chapter and its subject-matter, and to Anne Dykstra for his gift of one of its sources. 
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10.2 HISTORICALLY ORIENTED 
DICTIONARIES BEFORE THE NINETEENTH 
CENTURY 


In a number of ancient dictionary traditions, historically oriented lexicography came 
before any other kind. This was true, for instance, of the first wordlists of Sanskrit, which 
explained difficult archaic language from the Vedas, and of Greek, which explained 
the language of Homer for Hellenistic readers.” Some of these early wordlists, together 
with some of the bilingual wordlists at the heads of other lexicographical traditions, are 
dated and listed in the appendix to this volume. They were not fully historical, because 
they were not concerned to show how contemporary and archaic words had developed, 
but they were compiled in order to make some sense of historical change. Although the 
English-language lexicographical tradition began with Latin-English glossaries, and 
remained bilingual throughout the Middle Ages, there were sixteenth-century word- 
lists of Old and Middle English. The former might be called bilingual on the grounds 
that Old English was really a different language from early modern English (whereas 
Homeric Greek or Middle English were not so different from the language varieties of 
their first lexicographers), but the first substantial Old English wordlist, compiled by 
Lawrence Nowell before 1567, cites a number of contemporary dialect reflexes of Old 
English words, and so it tells, or at least suggests, a narrative of historical development. 
The lexicography of Old and Middle English, Middle Scots, Old and Middle French, and 
other medieval language varieties perceived as being to some extent continuous with 
modern ones, provided a historical dimension to some seventeenth- and eighteenth- 
century European lexicography. So did the development of etymological dictionaries 
such as those of Sebastian Covarrubias Horozco (of Spanish in 1611), Gilles Ménage 
(of French in 1650 and Italian in 1669), Franciscus Junius (of English before 1677, not 
published until 1743), Stephen Skinner (of English, published in 1671 with an abridged 
English translation by Richard Hogarth in 1689), J. G. Wachter (of German in 1737), and 
Johann [hre (of Swedish in 1769). These etymologica did not, of course, offer narratives 
of continuous historical development any more than dictionaries of archaic language 
varieties did. 

Such narratives did, however, develop in early modern European lexicography. One 
of their points of origin is, I suspect, the medieval Hebrew dictionaries such as David 
Kimhi’s Sefer ha-shorashim (Book of Roots), which were structured so as to show the way 
in which multiple Hebrew words could be generated from each triconsonantal root. 
‘These were known to Christian scholars from the late fifteenth century onwards. The 
great dictionaries of Latin and Greek compiled respectively by Robert and Henri Estienne 


2 For Sanskrit, see Patkar (1981: 1-6) (and for the problems of dating, Cardona 1976: 270-3); for Greek, 
Pfeiffer (1968: 90-2). 
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in the sixteenth century were similarly arranged in a semi-etymological order, a given basic 
word being followed immediately by its supposed derivatives (Considine 2008: 46-7, 85-6). 
Although both Estiennes were well aware that some texts were earlier than others, they did 
not attempt to sort illustrative quotations chronologically: the structure of their dictionar- 
ies, like that of Hebrew dictionaries, told a story of morphological development, but not of 
the historical development of the senses of a given lexeme. Florentine philologists as early as 
the sixteenth century had been capable of placing the texts which they studied into chrono- 
logically sequenced groups, but this historical knowledge was likewise not translated into 
the historical sorting of quotations in dictionaries like the Vocabolario degli Accademici della 
Crusca of 1612. As the seventeenth century went on, narratives of morphological develop- 
ment continued to engage the interest of lexicographers, especially in Germany, where sev- 
eral dictionary projects were planned on the semi-etymological Stammwortprinzip. The first 
edition of the Dictionnaire de !Académie francaise (1694) likewise had a semi-etymological 
structure—described in its prefatory material as ‘pleasant and instructive ... because one 
sees, if the expression is permissible, the history of the word, and observes its birth and its 
progress’ —although the second edition, of 1718, was rearranged in strict alphabetical order.* 
As late as 1789-94, the first edition of the the Slovar’ Akademii Rossiiskoi, the six-volume dic- 
tionary of the Russian Academy, was arranged semi-etymologically, by the considered deci- 
sion ofits project director, Princess Ekaterina Dashkova. 

In a preface of 1690 to the Dictionaire universel of Antoine Furetiére, Pierre Bayle imag- 
ined a dictionary in which ‘the history of words could be included—that is to say, the period 
of their reign and of their decadence, with the changes of their meaning’ He added that ‘It 
would be necessary to do with respect to these old words what is done in dictionaries of the 
dead languages, namely to quote extracts from some author who has used them.” Brilliantly 
innovative as Bayles thinking was, he did not go so far as to imagine dictionary entries based 
on chronologically ordered sequences of quotations which would illustrate the full his- 
tory of a given word. Similar ideas to his were proposed sporadically in the next century. 
So, Jean-Baptiste de Lacurne de Sainte-Palaye wrote in a pamphlet of 1756 announcing the 
preparation of a dictionary of medieval French that by illustrating each word with multiple 
examples in unmodernized spelling, he would show ‘the different stages through which the 
same word has passed, receiving a number of successive alterations in its pronunciation, its 
spelling, and so on, which would be ‘like links in a chain, bringing us nearer and nearer to the 
original of the word which we use today’® He was followed in the 1770s by Voltaire and the 


> For the academy tradition as a whole, see Considine (2014b) (on which this paragraph and the next 
are based); see also Zgusta (2006, 6-19). 

4 Dictionnaire de VAcadémie francaise (1694) sig.é1r ‘agreable et instructif... parce quon voit s'il faut 
ainsi dire Histoire du mot, et quion en remarque la Naissance et le Progrez. 

> Dictionaire universel (1690) 1: sig. ***2r, ‘On y pourroit inserer l’Histoire des mots, cest A dire, le 
temps de leur regne et celuy de leur decadence, avec les changemens de leur signification. It faudroit 
observer a légard de ces vieux termes ce qu’on pratique dans les Dictionaires des langues mortes, cest de 
cotter les passages de quelque Auteur qui les auroit employez: 

§ Sainte-Palaye (1756: 16) ‘Les différents degrés par lesquels le méme mot a passé, en recevant plusieurs 
changements successifs dans sa prononciation, dans son orthographe, etc. sont autant de chainons qui 
nous conduisent de proche en proche 4 lorigine du mot dont nous nous servons aujourd'hui: 
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extraordinary blind lexicographer Marie-Charles-Joseph de Pougens (see Wexler 1959: 95-9). 
In the same decade, Johann Christoph Adelung announced in the preface to his Versuch eines 
vollstiindigen grammatisch-kritischen Worterbuches der hochdeutschen Mundart that his rule 
for the arrangement of senses would be to order them ‘in conformity with how they appear to 
have developed from each other’ He acknowledged, though, that ‘Indeed, this law could not 
be followed everywhere, because the first meaning ofa given word. . . is no longer available, or 
because some rungs from the ladder of meanings have been lost, or lie hidden in the dialects. 
In this case the meanings can, indeed, only be ordered conjecturally:’ Adelung could see that 
the presentation of the morphological development of a word is jejune in comparison with the 
presentation of its semantic development. That vision did not bring him all the way to the his- 
torical principles of the nineteenth century: for him, the primary meaning of a word could be 
determined by reason—it was, for instance, naturally concrete rather than figurative—rather 
than by the examination of dated evidence. But the diachronic element in his dictionary was 
stronger than that in any previous general monolingual dictionary had been. 


10.3 THE EMERGENCE OF MODERN 
HISTORICAL PRINCIPLES 


In 1808, the Scottish antiquary John Jamieson published the two volumes of his 
Etymological dictionary of the Scottish language, which treated Scots from the earliest 
available materials to Jamieson’s own day (see Rennie 2012). This was primarily a dic- 
tionary for the readers of older Scottish works of literature, ‘necessary, not merely for 
illustrating their beauties, but in many instances even for rendering them intelligible’ 
of medieval languages. What set it apart from them was that, whenever possible, each 
entry was illustrated with quotations placed in chronological order, starting with the 
earliest attestation: ‘On every word, or particular sense of a word, I endeavour to give 
the oldest printed or MS. authorities’ (Jamieson 1808: v). These quotations were drawn 
from canonical and non-canonical texts, and were given precise references so that they 
could be looked up in the sources. Because it is founded on historical quotations used in 
this way, Jamieson’s dictionary has been described as ‘the first British dictionary which 
deserves to be called historical.® It may indeed have been the first dictionary in Europe to 


7 Versuch eines volistiindigen grammatisch-kritischen Worterbuches der hochdeutschen Mundart 
Adelung (1774-86, 1: xiv) ‘Die Bedeutungen . .. sind... geordnet . . . wie sie vermuthlich aus und auf 
einander gefolget sind. Freylich wollte sich dieses Gesetz nicht tiberal] befolgen lassen, weil die erste 
Bedeutung eines Wortes . . . nicht mehr vorhanden ist, oder weil manche Sprossen aus der Leiter der 
Bedeutungen verloren gegangen sind, oder in den Mundarten verborgen liegen. In diesem Falle konnte 
die Bedeutungen freylich nicht anders als muthmasslich geordnet werden: Cf. Schrader (2012: 171-2). 

8 Aitken (1973: 38) ‘le premier dictionnaire britannique méritant le titre historique. 
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deserve that title, and perhaps it was the first in the world; at least one non-European lexi- 
cographical tradition, the Chinese, included major dictionaries which quoted early texts 
(see Harbsmeier 1998: 65-82), but these were not, as far as I am aware, laid out to provide or 
illustrate a narrative of change. 

Four years after Jamieson’s dictionary appeared, the German classicist Franz Passow pub- 
lished a Programmschrift—an academic working paper—called Uber Zweck, Anlage, und 
Erganzung griechischer Worterbiicher (On the aim, plan, and completion of Greek dictionar- 
ies), The paper ranges widely: had Passow been a Victorian Englishman, he might perfectly 
well have called it ‘On some deficiencies in our Greek dictionaries~-and particularly in J. G. 
Schneider's Handwérterbuch der griechischen Sprache (1798-1799). At one point, it articu- 
lates as a general principle what Jamieson had already put into practice: 


the indication of the writer who has used a particular word must include a chrono- 
logical specification; it therefore follows that the first authority adduced for a word 
which comes into the language should not be the first in quality, the best, but rather the 
earliest.° 


Passow himself acted on this principle in the revised edition of Schneider's 
Handworterbuch which he published in 1819-24, and restated it in a classic formulation 
in the foreword to the next edition, written in 1825: 


The dictionary should therefore set out, as it were, the life story of each single word 
in a conveniently ordered overview: it should state where and when each one was 
(as far as we know, of course) first hit upon, in which directions it developed ...and 
finally, at what period it disappeared from use.” 


The point expressed by the words ‘conveniently ordered’ is important: in his entry struc- 
ture, Passow saw the apparent logical development of one sense from another as fun- 
damentally important, even when the dates of the earliest attestations of the different 
senses were at odds with the logical order he discerned (Zgusta 2006: 27-38). 

It was doubtless in the preface of this important dictionary rather than in the grey 
publication of 1812 that Passow’s thought was most widely disseminated." The phrase 
‘die Lebensgeschichte jedes einzelnen Wortes’ occursin later lexicographical discussions 


> Passow (1812: 32) ‘die Nachweisung des Schriftstellers der ein jedes Wort gebraucht hat, eine 
chronologische Bestimmung enthalten miisse: es folgt also, dass nicht der erste, der beste; sondern der 
Alteste als erste Auctoritat fiir das Wort, das zur Sprache kommt, angefiihrt werden muss? 
© Passow (in Handwérterbuch der griechischen Sprache: xx) ‘Das Worterbuch soll also gleichsam 
die Lebensgeschichte jedes einzelnen Wortes in bequem geordneter Ueberschaulichkeit entwerfen: 
es soll Auskunft geben, wo und wann ein jedes (natiirlich immer soviel wir wissen) 2uerst gefunden 
werde, in welchen Richtungen es sich fortbildete . . . endlich um welche Zeit es etwa aus dem gebrauche 
verschwinde? 
4 Uber Zweck, Anlage, und Ergdnzung griechischer Worterbiicher appears, from the evidence of 
present-day library holdings, not to have circulated much beyond Germany. It was not reprinted in 
Passow’s Opuscula academica of 1835 or his Vermischte Sch riften of 1843. 
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from German-speaking Europe (e.g. at the founding of the Thesaurus Linguae Latinae 
project: Hartel 1893: 192). Henry Liddell and Robert Scott published their Greek-English 
lexicon, Based on the German work of Francis Passow in 1843, making Passow’s historical 
principles readily available to Anglophone readers, and adding new material to many of 
his entries (Zgusta 2006: 35-8). Hans Aarsleff has argued that ‘The OED depends for its 
lexicography entirely on Liddell and Scott's Greek-English Lexicon’ (1962, 419; cf. Zgusta 
2006: 72-5). To see that this claim is an exaggeration, one need only consider what the 
editors of the OED learned from Samuel Johnson, or indeed from Jamieson (for the lat- 
ter, see Silva 2000: 79). But it is likely that the thought of Passow was transmitted to 
the editors of the OED by Liddell and Scott, whatever its influence on their work. The 
foundational document of the dictionary, R. C. Trench’s ‘On some Deficiencies in our 
English Dictionaries, does not mention Passow, though the first English dictionary it 
names is that of Liddell and Scott (Trench 1860: 5). The first editor of what became the 
OED wrote in the first years of the dictionary project ‘that the theory of lexicography we 
profess is that which Passow was the first to enunciate clearly and put in practice suc- 
cessfully—viz., “that every word should be made to tell its own story”’ (Coleridge 1857: 
72). These are not the words of Passow, but of Liddell and Scott, who wrote in their pref- 
ace of 1843 that ‘Our Plan has been that marked out and begun by Passow, viz. to make 
each Article a History of the usage of the word referred to’, and then, after a discussion 
of the ways in which they had developed Passow’s entries, remarked ‘In most cases the 
word will tell its own story’ (Greek-English Lexicon 1843/1845: vii). 

Important as the ideas of Passow, as transmitted by Liddell and Scott, were for the 
OED, the example of Jamieson makes it clear that lexicography on historical principles 
was not simply an invention of Passow’s.” The first dictionary on historical principles of 
a modern language was the Deutsches Worterbuch (DWB) of Jacob and Wilhelm Grimm, 
undertaken in 1838, with a first fascicle published in 1852. Here again, Passow’s influence 
must not be over-estimated (cf Zgusta 2006: 51-3). Wilhelm Grimm conceptualized 
the historical narrative of each entry not as a Passovian life history (Lebensgeschichte) 
but as a natural history (Naturgeschichte).4 These two powerful metaphors can both be 
traced back to the comparison, made as early as the fifteenth century (see Considine 
2010a: 61), between the ages of human life and the stages of development of a language. 
This comparison was used by a number of German classicists in the seventeenth and 
eighteenth centuries (Considine 2010a: 67~8, 70-1), and this suggests why it may have 
occurred to Passow. It was given new life by J. G. Herder in his prize essay ‘Ueber den 
Ursprung der Sprache’ (‘On the Origin of Language’), in which the development of lan- 
guage was conceived in organic terms. This aspect of Herder’s thought influenced that of 
Jacob Grimm (Aarsleff 1967: 152). Other philologists may have been inspired by Herder 
or may have reflected independently of him that they had much to learn from the life 


® For a fuller treatment of this point, see Considine (20144). 

5 For DWB’s coverage and presentation of historical evidence, see Osselton (2000: 62); the complete 
absence of Passow’s name from the index nominum of Kirkness (1980) is suggestive. 

4 Grimm (1846/1847: 118) ‘eine Naturgeschichte der einzelnen Woérter’ 
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sciences as represented by Linnaeus, whose writings fascinated William Jones (Aarsleff 
1967: 124), and moved Adelung to remark that his vision of semantic development might 
be seen as Linnaean (Versuch eines vollsttindigen grammatisch-kritischen Worterbuches 
der hochdeutschen Mundart (1774-86) 1: xiv-xv). 

So, one element in the apparently independent development of historical principles 
by Jamieson, Passow, and the Grimms was the understanding, increasingly widespread 
from the end of the eighteenth century onwards (see Aarsleff 1967: 86), that language 
change could usefully be seen in organic terms, and that individual words could there- 
fore be seen as having histories of development, from birth to death, like those of people 
or other organisms. A second, more obvious element was the cumulative achievement 
of the eighteenth century in making early texts available and in assigning dates to them. 
A glance at Jamieson’s bibliography (1808: ix-xv) makes the point, and it is explicit in 
Liddell and Scott’s acknowledgement of their dependence on Henry Fynes Clinton's 
chronology of Greek literature in his Fasti Hellenici of 1824-34. In order to build up a 
chronological sequence of instances of a given word, lexicographers needed dated texts. 
‘The Grimms had the great advantage in this respect that many of the texts they drew on 
were available in self-dating printed editions, and this advantage likewise benefited the 
editors of the OED and other dictionaries of living languages. 


10.4 THE DEVELOPMENT 
OF HISTORICAL PRINCIPLES 


Eleven years before the first fascicle of DWB was published, Emile Littré had signed a 
contract with the publisher Hachette, a school friend of his, for a similar work, which 
was to be published as his Dictionnaire de la langue francaise (1863-73). Littré acknowl- 
edged the inspiration of DWB in the execution of his dictionary, and followed the 
Grimms’ example in his scanty provision of dating; he treated medieval material, of 
which selected short quotations, chronologically arranged, were presented in appropri- 
ate entries, as providing a historical background for the post-1600 French which was 
the focus of his dictionary (see Osselton 2000: 64-8). A third project had been founded 
by the Dutch philologist Matthias de Vries in 1851, before either the Grimms or Littré 
had begun to publish. This, a dictionary of post-medieval Dutch, would be realized on 
increasingly rigorous historical principles as the Woordenboek der Nederlandsche Taal 
(WNT; 1882-1998 and 2001; see Osselton zooo: 68-72, and Eickmans 2012). These three 
dictionaries were followed by the NED/OED, the preparation of which began in 1857, 
with a first fascicle appearing in 1884. It has been rightly said that “The one feature which 
most of all marks out the OED among its rivals [DWB, Littré, WNT] is the sheer length 
of its continuous documentation from the earliest records of English down to the very 
latest’ (Osselton 2000: 73), and this length of documentation makes it possible to argue 
that the OED is the dictionary in which historical principles are most elaborately and 
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satisfyingly developed. Two more major historical dictionary projects had begun by the 
time the first fascicle of the OED appeared. After an abortive project in the 1860s, serious 
planning for a historical dictionary of Swedish from sources later than 1520 had begun 
in 1883, and would lead to the publication of the Ordbok éver svenska spraket, usually 
known as the Svenska Akademiens Ordbok (SAOB, 1893-). Finally, in the first year of the 
twentieth century, the Danish scholar Verner Dahlerup, who had been collecting mate- 
rials for a Danish dictionary since 1882, signed a contract for its publication. He realized 
in the following years that his work should be modelled on the ongoing SAOB, DWB, 
WNT, and OED projects; it was to become the Ordbog over det danske Sprog, an histori- 
cal dictionary of Danish since 1700 (1918-56; for it, and for SAOB, see Malmgren 2012). 
The major European languages which were not documented by these historical diction- 
aries were Spanish, Italian, Portuguese, and Russian, each of which was served by a dic- 
tionary in the normative and non-historical academy tradition; the same might be said 
of Polish, though the six-volume dictionary of Linde was not the work of an academy. 

This is not the place for a survey of the historical dictionary projects of the nineteenth 
century, or those of the twentieth and twenty-first centuries. A first sketch towards such 
a survey is available as Considine (2010b), and more detailed articles are gathered in 
Schweickard (2011). A few typological remarks, drawing for the most part on examples 
from English, must be offered instead. First, each of the six projects mentioned above 
registered the vocabulary of a European language with a written standard. Their shared 
principle was the illustration of every sense of every entry with quotations from written, 
and usually printed, sources, starting with the earliest which could be obtained, or with 
one early enough to indicate currency at the beginning of the period which the diction- 
ary documented, and continuing to the last available source for an obsolete sense, or to a 
source recent enough to suggest the currency of one still in use. None of them attempted 
the exhaustive coverage of regionalisms, of highly technical vocabulary (the OED was 
the fullest in this regard), or of the vocabulary of the very earliest stages of the language. 

Dictionaries of medieval language varieties continued to be undertaken in the nine- 
teenth century and beyond. The major ones were founded on quotation evidence, 
although problems of dating sometimes precluded the strictly historical arrangement 
of this evidence. This problem has been handled variously: for instance, quotations in 
the Middelnederlandsch woordenboek (Middle Dutch Dictionary), founded in 1885, are 
dated irregularly in the entry text, while those in the Vroegmiddelnederlands woorden- 
boek (Early Middle Dutch Dictionary) of 1999 are dated consistently but, because the dic- 
tionary only covers texts from a single century, they are not sorted chronologically. In 
the Middle English Dictionary (1952~2007), quotations are dated with double attention 
to the differing dates of composition and of extant witnesses. 

A different set of dating problems was presented by dictionaries of regionalisms, 
for which the evidence was often a mixture of datable texts and recent recordings of 
lexical items which might have had long unattested lives. Such dictionaries have had 
to be arranged on what might be called mixed historical principles. The original plan 
of the English Dialect Dictionary to arrange quotations by county but to date each one 
in the text (see Wright 1932: plate facing 2: 370) had to be replaced by a plan in which 
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their sources were indicated by sigla and information about dating had to be retrieved 
from a bibliography. FE. G. Cassidy and R. B. Le Page’s Dictionary of Jamaican English of 
1967 dated its quotations, and differentiated lemmata well attested in print from those 
known chiefly from oral sources by printing the former in full capitals. The Dictionary 
of American Regional English treats all lemmata consistently, but also dates all its quota- 
tions, including those based on fieldwork. 

There is a long tradition of compiling specialized historical dictionaries, notably of 
legal terms, whose meanings in early texts may be important to lawyers centuries later. 
This continues in work as diverse as the Deutsches Rechtswérterbuch (12 volumes had 
been completed by 2013, registering 91,584 entries from Aachenfahrt to schwedisch), 
Juhani Norri’s on late medieval English medical vocabulary (see Norri 2010), and 
Jonathan Lighter’s and Jonathon Green's dictionaries of English slang. Single-author 
dictionaries on historical principles do exist, notably the Goethe Worterbuch (1978- ); 
the obvious candidate for such a dictionary in English is Shakespeare, but his vocabu- 
lary is so fully covered in the OED that separate dictionaries of it, which are numerous, 
have tended to be on a modest scale. 


10.5 CURRENT ISSUES 


At the beginning of this chapter, the revision of the OED’s entry historical was used as a 
case study on the changing principles of historical lexicography. Its development sug- 
gests some of the current issues in the field. 

First of all, it is a reminder of the likelihood that any given loanword has entered 
English usage by a process of multiple borrowing, and that any given affixed form has 
entered English usage by multiple applications of the same derivational process, rather 
than, in either case, by organic development after a single borrowing or act of affixa- 
tion (cf. Durkin 2009: 68-73 and 165-9). There is no reason to suppose that the sec- 
ond attestation of historical in the OED, from a poem of 1521, is the work of a reader of 
the single manuscript in which the first attestations appear, or even that it shares the 
first attestations’ close relationship with Latin historicus rather than being a variant of 
English historial or a formation from English historie. In one of the last of the special 
uses registered in the OED entry, historicall shirt (a1640, in a single primary attestation 
witha nineteenth-century echo) where the sense is close to that of the hapax legomenon 
historified (1633), it is hard to tell whether the form historicall was adapted from its more 
mainstream uses or formed independently from history ‘pictorial representation (OED 
s.v. history sense 5). Insofar as lexicography on historical principles was designed to tell 
monogenetic stories, it is now doing work for which it was not designed. The quotation 
evidence in historical dictionaries is often used as the basis for simplistic claims about 
the date at which a word entered a given language, and this is because readers suppose a 
monogenetic narrative to be normally implicit in this evidence. It is, of course, hard to 
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imagine a format for the presentation of attestations which would not be misleading in 
this way. 

Even though it does not tell an extremely complex story, the entry historical pre- 
sents an interplay of forms in different languages which is too complex to be fully 
documented in a dictionary of any one of those languages: it is not the business of an 
English dictionary to ascertain the ways in which German historisch might underlie 
the German-speaking Alexander von Humboldt's use of French historique in 1823 (cf. 
German historische Geologie, attested at least as early as 1834) or indeed the Russian form 
istorik-lingvist of 1873 (cf. ‘Grimm, the great champion of the historical school of lin- 
guists’ cited from a text of 1832 in OED s.v, historical sense 2c). A similar interplay is part 
of the history of many English words, and of course in many words in any European lan- 
guage with a rich written tradition. The whole concept of a historical dictionary of a sin- 
gle language must have seemed more straightforward as the great nineteenth-century 
historical dictionaries were being founded than it does now, sensitive as their makers 
were to the connections between languages and cultures. Again, it is hard to imagine a 
large-scale alternative, although multilingual historical dictionaries for clearly defined 
subject areas are conceivable. 

The online publication of major historical dictionaries would, to be sure, make it pos- 
sible for such dictionaries to link to each other. The linking of historical dictionaries 
is already being achieved. For instance, OED entries link to the relevant entries in the 
Dictionary of Old English, the Middle English Dictionary, and the Historical Thesaurus 
of the OED. When the project team of the Mittelhochdeutsches Wérterbuch sent its 
congratulations to the editors of the Middle English Dictionary on its completion, their 
message made the suggestive statement that ‘We look forward to establishing and con- 
tinuing contact between the two projects, as we have put online the existing Middle 
High German dictionaries in an interlinked compound’ (MED tributes 2002: 19). The 
Geintegreerde Taalbank of the Institut voor Nederlandse Lexicologie gives integrated 
access to five historical dictionaries, which can be searched jointly or individually: the 
Woordenboek der Nederlandsche Taal, the dictionaries of Old Dutch, Early Middle 
Dutch, and Middle Dutch, and the Woordenboek der Friese Taal | Wurdboek fan de 
Fryske taal (see Schoonheim and De Tier 2010; Sijens and Depuydt 2010). The presence 
of the last of these in the Taalbank is particularly suggestive: the dictionaries of Dutch are 
being partially integrated with a dictionary of a different language, in fact the language 
most closely related to English. Linking, however, is not the same as full integration. 
There are such deep differences between the major historical dictionaries that the vast 
labour of linking one to another would, at least in the near future, often be pointless: for 


' There were inevitable errors in the establishment of links, as Charlotte Brewer has pointed out 
(Brewer 2005: sect. ‘OED editions, updates, and revisions, subsect. ‘Other changes introduced in OED 
Online re-launch of December 2010’): ‘On occasion, these links also (as of December 2011) need refining 
and adjustment. So s.v. batten, v.1, we are cross-referenced to a DOE entry which is relevant, batian 
(which can mean “to thrive”), and an MED entry which is irrelevant, batten “to beat, stamp, pound, 
knock” 
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instance, the rudimentary entry historisch (published 1873) in the online version of the 
Deutsches Worterbuch sheds no light on the German role in the development of English 
historical. Even if that entry were developed further, the aspects of the development 
of the German word of most interest to English lexicographers might not be the most 
important from a German-language perspective. The historical record of Frisian would 
presumably shed only indirect light on the sense-development of English words. 

The online availability of huge and increasing bodies of pre-contemporary and con- 
temporary texts has presented historical lexicography with daunting new opportunities. 
The editors of the nineteenth-century dictionaries had some concordances and verbal 
indexes available to them (the latter including such ad hoc verbal indexes as those pre- 
pared by William Chester Minor, for which see Knowles 1990), but much of their work 
depended on the materials gathered in the course of imperfectly systematic reading pro- 
grammes. By contrast, OED lexicographers today have access to textbases such as Early 
English Books Online and Eighteenth-Century Collections Online, which between them 
offer digital images of the vast majority of pre-1800 English printed books, 48,339 of the 
pre-1700 books being available as keyboarded text as of April 2014, and all the eight- 
eenth-century books being searchable. ‘The pre-1800 material least readily available to 
them is very much that which was least represented in the first edition of the OED: non- 
literary texts which were transmitted in manuscript until after 1800. These, for the most 
part, have to be searched by well-established reading-programme techniques, unless 
their texts are among the information made available (with famous problems of patchy 
availability and automatically generated metadata) on Google Books. Major electronic 
corpora such as the Corpus Diacrénico del Espatiol (CORDE) of the Real Academia 
Espafiola, and the Corpus del Espafiol created by Mark Davies, are becoming available to 
support the historical lexicography of other languages. 

A striking feature of entries in the Trésor de la langue francaise is their systematic 
provision of frequency information, made possible by the dictionary’s being founded 
on a computerized text archive. This could, in theory, be done for dictionaries with a 
longer historical perspective. A graphical presentation of frequency information could, 
for instance, make it easy to see the way in which historicall, with a couple of fairly iso- 
lated attestations before 1530, comes into frequent use from the 1530s onwards, no doubt 
partly asa result of the use of historical Christian ‘Christian whose faith is based on his- 
torical evidence rather than personal conviction’ in polemical debate. Such information 
would need to be based on a corpus which would remain stable for the period of the 
compilation of the dictionary, or whose changing results could be updated in all entries. 
Building a corpus sufficiently large and sensitively tagged to provide interesting fre- 
quency information for most of the entries in a large historical dictionary would be a 
formidable undertaking. 

Having said this, it is worth remembering that historical dictionaries are not cor- 
pora. The OED in particular has been criticized for failing to do some of the things 
which a good historical corpus would do, such as balancing the representation of male 
and female authors in its quotation paragraphs. Some kinds of balance are within the 
power of historical lexicographers; an example of a historical dictionary in which they 
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have been rather carefully considered is the Woordeboek van die Afrikaanse taal (see 
Harteveld and van Niekerk 1996). But others are impracticable: one rough estimate 
(Considine 2009: 631) suggests that finding female-authored substitutes for one quarter 
of the male-authored quotations in the OED might be achieved in one hundred thou- 
sand person-hours. It would be very hard to justify this deployment of the dictionary’s 
resources. 

A last, and salient, current issue in historical lexicography has been glanced at above: 
large historical dictionary projects of the kinds which lent themselves to fascicular pub- 
lication in the nineteenth and early twentieth centuries now lend themselves to online 
publication, This provides subscription revenue; facilitates the global management of, 
for instance, the complicated bibliographical information characteristic of historical 
dictionaries; facilitates linking to other dictionaries and reference sources such as indi- 
ces fontium (and, in the case of the OED, the Oxford Dictionary of National Biography); 
and frees publishers and librarians from the difficulty and expense of handling and stor- 
ing very large volumes of paper. On the other hand, online dictionary entries are less 
easy to consult than printed ones, especially when a long entry, or a number of related 
entries, are to be consulted. Online text also requires more active maintenance than 
printed text: most information which was published in book form fifteen years ago is 
still readily accessible, while much information which was published online at the same 
time has vanished without trace. For this reason, the tendency of dictionary projects to 
move towards online-only publication, as may be done not only by the Oxford English 
Dictionary but also by the smaller and much more readily print-publishable Dictionary 
of Canadianisms on Historical Principles, is disturbing: when work of lasting scholarly 
value is only preserved in online form, for how many decades can its availability be 
guaranteed? 
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11.1 INTRODUCTION 


THE quotation evidence is the bedrock of any historical dictionary. The relationship 
between the definitions of each sense in a historical dictionary and the quotations 
that accompany it is particularly close. In the compilation of a historical dictionary 
(at least in modern times) the quotation evidence provides a sample of the empirical 
data on which the definitions have been based. Indeed, for rare words or meanings, 
or for historical periods for which not much data survives, the quotation evidence 
given in the dictionary entry may be all of the evidence that exists. Because of this close 
inter-relationship between quotation evidence and definitions, the two topics will be 
treated together here. 

This chapter will draw its examples largely from two different historical diction- 
aries, documenting two different languages of the British Isles: the Oxford English 
Dictionary (OED),! and Geiriadur Prifysgol Cymru: A Dictionary of the Welsh 
Language (GPC).” The rather different perspectives of these two dictionaries, which 
apply broadly similar methodology, but document the histories of languages with 
very different contemporary speaker communities and with a very different extent of 
documentation available for earlier periods, make it possible to gain a more rounded 
picture of the historical method in action than would be possible from looking at one 
dictionary in isolation. 


1 See <http://www.oed.com>. 


2 See <http://www.welsh-dictionary.ac.uk> for the project's website and <http://gpc.wales> for ‘GPC 
Online’ 
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11.2 THE INTERRELATIONSHIP 
BETWEEN QUOTATION EVIDENCE 
AND DEFINITIONS IN HISTORICAL 
DICTIONARIES: HISTORICAL BACKGROUND 


ene reper rerer rarer irearareretvertrerrirarenrirtreeririrrrrirrireririrrirerirriritesvirirerver river Serve errereirrirteretirrir iti tiri tretinoin 


Historical dictionaries by their very nature are evidentially based, and it is the quota- 
tions or citations’ that provide the evidence upon which the sense divisions and defini- 
tions are based. Many dictionaries in the period preceding that of the great historical 
dictionaries included quotation evidence, however they tended not to do this on a regu- 
lar basis for every entry, but rather to provide supporting evidence from acknowledged 
authors. For example, Samuel Johnson’s great Dictionary of the English Language (1755) 
provided many quotations from a wide selection of authors, but not all entries were sup- 
ported by quotation evidence, especially the more common words, and no quotations 
were provided for function words. The quotations appear to have been provided pri- 
marily as examples of ‘good usage or for their intrinsic value rather than to help exem- 
plify the meaning. In his Plan ofa Dictionary of the English Language, Johnson explained 
his use of quotations as follows: 


In citing authorities, on which the credit of every part of this work must depend, it 
will be proper to observe some obvious rules, such as of preferring writers of the first 
reputation to those of an inferior rank, noting the quotations with accuracy, and of 
selecting, when it can be conveniently done, such sentences, as, besides their imme- 
diate use, may give pleasure or instruction by conveying some elegance of language, 
or some precept of prudence, or piety.* 


As already noted, an examination of the practice ofa Welsh dictionary, Geiriadur Prifysgol 
Cymru: A Dictionary of the Welsh Language (GPC) will provide the basis for much of the 
discussion in the present chapter. Early Welsh dictionaries followed a similar pattern 
to contemporary English dictionaries, although some simply included references to 
particular authors or texts rather than providing quotations, for example John Daviess 
highly influential Antique linguee britannice ... et lingu@ latine, dictionarium duplex of 
1632 and Thomas Lloyd’s manuscript dictionary ofc. 1730. John Walters’s English- Welsh 
Dictionary of 1794 is similar to that of Samuel Johnson in providing numerous examples 


> Some lexicographers, especially in the continental tradition, use ‘citation’ in the same sense 
as ‘quotation’ to refer to a quoted example; here, ‘quotation’ will be used throughout so as to avoid 
confusion. 

4 Asedited by Jack Lynch and cited from Fontenelle (2008: 29). 

> An interleaved copy of Davies's dictionary with very extensive manuscript additions (National 
Library of Wales Minor Deposit 1389; see Jones (1955)). It is possible that Thomas Lloyd intended to cite 
more fully in the printed version of the dictionary. 
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from well-known authors, the Bible, and other well-known sources, but it also appears to 
include numerous made-up examples.° Some later Welsh dictionaries provide quotations 
to illustrate certain entries (e.g. William Owen Pughe'’s Geiriadur Cynmraeg a Saesoneg: 
A Welsh and English Dictionary (1803) and D. Silvan Evans's Dictionary of the Welsh 
Language (1887-1906)), but this is not done on a regular basis. 

In the European lexicographical tradition, the frequent use of quotations in a dic- 
tionary, as Considine (2008: 29) has shown, goes back at least to the sixteenth-century 
lexicographers of classica] Latin, Ambrogio Calepino and Robert Estienne, whose work 
was extremely influential. Considine further notes (2008: 86) that the work of Robert's 
son Henri, with its extensive quotation evidence, albeit not in chronological order, is 
‘an immensely important precursor to the nineteenth-century principle that every dic- 
tionary entry should be historically arranged, and should tell the story of the word it 
documents from the earliest witnesses to the latest’ (see also Considine, this volume). 
However, it was the Philological Society's New English Dictionary (later more generally 
known as the Oxford English Dictionary) that first adopted the principle of systemati- 
cally providing quotations to illustrate every sense in the dictionary, arranged in chron- 
ological order from the earliest known attestation. In his introduction to the dictionary, 
James Murray explained the ‘historical principles’ upon which the dictionary was based: 


The order in which these senses were developed is one of the most important facts 
in the history of the word; to discover and exhibit it are among the most difficult 
duties of a dictionary which aims at giving this history. If the historical record were 
complete, that is, if we possessed written examples of all the uses of each word from 
the beginning, the simple exhibition of these would display a rational or logical 
development. The historical record is not complete enough to do this, but it is usually 
sufficient to enable us to infer the actual order. In exhibiting this in the Dictionary, 
that sense is placed first which was actually the earliest in the language: the others 
follow in the order in which they have arisen. As, however, the development often 
proceeded in many branching lines, sometimes parallel, often divergent, it is evident 
that it cannot be adequately represented ina single linear series. (OED i, xxi) 


Indeed, the entire dictionary was based upon a collection of quotations gathered 
expressly for that purpose. 

James Murray established the principle for the OED that that dictionary’s definitions 
should be based upon, and corroborated by, the evidence of quotations (on earlier sim- 
ilar developments in other historical dictionaries see Considine, this volume, Section 
10.3). Murray believed that the historical development of the various senses could be 
determined from the evidence, and that this semantic development could be traced 
chronologically through the quotations. With many words this is indeed the case but, in 
other cases, the evidence for figurative use may precede the literal, or specialized senses 
may predate more general ones. This may simply be the result of a lack of quotations 


6 Johnson's dictionary included around 114,000 quotations, some of which he modified or invented 
himself (Jackson 2002: 46). 
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demonstrating the actual development, or it may be that the history of a word is more 
complicated, with some figurative uses being borrowed from, perhaps, Latin at an ear- 
lier date than the more literal use which survived into Anglo-Norman and was then bor- 
rowed into English. This is particularly true of Welsh where English influence has been 
very pervasive over many centuries, and where the same English word may have been 
borrowed and reborrowed repeatedly into Welsh, often with different meanings, as the 
meanings of the English word have changed over time. Murray acknowledged that this 
often occurred in English: 


But in adopted or adapted words which had already acquired various significations 
in the language (e.g. Latin) from which they were taken, it often happens that the 
order in which the senses appeared in English does not agree with the natural order 
in which they were developed in the original language. The English order is, in fact, 
accidental. (OED i, xxi) 


11.3 DEFINING IN HISTORICAL 
DICTIONARIES 


Defining in synchronic dictionaries is treated in Chapters 7 to 9 and most of what is 
said there is equally applicable to defining in a historical dictionary. Historical lexicog- 
raphy as envisaged by Murray is descriptive in nature, being based almost entirely on the 
quotation evidence available, although there is inevitably a degree of subjectivity in the 
interpretation of problematic quotations. The historical lexicographer has some addi- 
tional problems to contend with. Whereas a modern dictionary is likely to draw heav- 
ily from a collection of relatively modern quotations and computer corpora of current 
usage, historical lexicography requires the lexicographer to be able to decipher evidence 
from all periods of the language in question. 

It is easy to misinterpret a short quotation (from any period) by not having sufficient 
context. A complete sentence may not even be enough, and it is sometimes necessary to 
read a whole paragraph to interpret a difficult word correctly. Poetry introduces further 
complications, where words may be used more for their sound than their sense, espe- 
cially in traditions such as Welsh where an intricate system of rules was developed over 
many centuries, such as cynghanedd, a strict set of rules governing internal consonantal 
correspondences and other metrical features (see Koch 2006: 2, 537-9). This severely 
limited the choice of words available in any particular context, and the intelligibility of 
a poem often depended heavily on the ingenuity of the poet. Some poets deliberately 
sought to use obsolete vocabulary and syntax in order to give their work greater gran- 
deur at the expense of intelligibility. Vaticinatory poetry was usually couched in deliber- 
ately obscure terms, using ambiguous vocabulary to obfuscate the prophecy. Problems 
of this kind can make it extremely difficult to disambiguate the various senses of a word. 
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As quotations of this kind can often be the earliest surviving evidence of the use of a par- 
ticular word, it can be difficult to determine the earliest attestations of particular senses. 


11.4 ENTRY STRUCTURE AND ITS IMPACT 
ON QUOTATION EVIDENCE AND DEFINITIONS 


With sufficient evidence, the development of meanings throughout the history of 
English—which Murray believed that it was possible to trace—should be evident, but, of 
course, the evidence which survives is only a very partial record of the language, and the 
lexicographer’s quotation collection is often but a sample of the surviving evidence. It is 
hardly surprising, therefore, that it can be very difficult to show the semantic develop- 
ment of many words. Zgusta (1971: 202) acknowledged this difficulty: 


The compilers of the very big historical dictionaries frequently cannot indicate the 
single senses of the words, and above all not the single quotations, in their real histor- 
ical sequence, because such a presentation would be rather chaotic; they must present 
their material, not [i]nfrequently, in logical groups, or by semantic connections, or by 
some other principle, and proceed historically only within these ‘chapters? This is one 
of the reasons why there is an important area of overlapping between the historical 
dictionaries on the one side, and the big monolingual dictionaries on the other. 


Silva (2000: 91-3) discusses the difference between the ‘logical’ ordering of senses and ‘his- 
torical ordering according to the available quotation evidence, acknowledging that Murray 
was obliged on occasion to give precedence to the former over the latter, but that the editors 
of OED3 ‘[w]ith a mass of new data at their disposal . . .are following the quotation evidence 
in ordering the senses, applying the historical method more rigorously than is the case in 
the first edition of the OED—but in tandem with the logical approach (Silva: 2000: 93). 
‘The historical dictionary with which the author is most familiar is Geiriadur Prifysgol 
Cymru: A dictionary of the Welsh Language (GPC), which is not a ‘pure’ historical dic- 
tionary on Murray’s model, but rather a hybrid of three types of dictionary: a histori- 
cal dictionary, a large monolingual dictionary, and a bilingual dictionary. GPC’s sense 
divisions are primarily based on meaning or grammatical function, rather than on per- 
ceived historical development. The major current sense is often given precedence over 
the perceived ‘original’ meaning if the word is still in everyday use. The metalanguage 
of the dictionary is Welsh, and therefore the work also functions as a large monolingual 


7 ‘The OED is currently undergoing its first ever complete revision of all existing entries. OED3 refers 
to this project of revising, supplementing, and updating the complete dictionary text, revised and new 
entries being referred to as constituting collectively the third edition of the dictionary. OEDz2 refers 
to the second edition of 1989, which brought together the first edition and its Supplements in a single 
alphabetical sequence, but with only minimal revision to existing dictionary text. 
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dictionary of Welsh of all periods, but as English synonyms (or definitions, if there are 
none) are given for each sense, it also functions as a bilingual Welsh-English dictionary. 

GPC typically has much more compressed articles than corresponding ones in the 
OED, As an example, here are the entries for two cognate words,*® showing an outline of 
the senses and the dates of earliest attestation of those senses:” 


OED3: 
carn 
1. a. A wheeled, usually horse-drawn conveyance; a carriage, cart, or wagon. In 
later use chiefly with modifying words. 


1 


41320 


b. Chiefly literary and poet. A chariot, esp. of war, triumph, splendour, or pag- 
eantry; spec. the chariot of Phaethon or the sun, or that in which the moon, 
stars, day, night, or time are said to ride. 


C1350 


c. A vehicle resembling a cart without wheels; a sleigh. rare (in later use Sc. and 
Canad. regional (Newfoundland)). 


c1488 
2. a, The passenger compartment ofa balloon, airship, cableway, etc.; a gondola. 
1783 
b. Chiefly U.S. The passenger compartment or cage of an elevator or lift. 
1847 


3. Chiefly N. Amer. 
a. A railway carriage or wagon (freq. with distinguishing word). Also: a street- 
car or tramcar. 


1826 


b. As many or as much as a railway car will hold; a carload of freight trans- 
ported by railway. 


1851 
4. =MOTORCAR nN. Now the usual sense. 


1896 


8 Eng. car is from either Anglo-Norman or French from a Gallo-Latin form borrowed from Celtic. 
Weish car is from the same Celtic root. Therefore the two words are ultimately cognate. 

* The examples from OEDz are from the third (online) edition; December 2012. <http://www.oed, 
com/view/Entry/27674>; accessed 12 April 2015. 
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GPC1: 


car 
1. vehicle, car, sled, dray; motor-car. 


13th cent. 


2. framework for keeping and mounting articles, &c., rest, rack, stand, horse; cra- 
dle on scythe, &c. 


c1400 


As can be seen, GPC’s sense 1, encompasses OED’s senses 1.a. to 1.c. and 4, The Welsh 
quotation evidence is not sufficient to differentiate the senses much further, although 
the meaning ‘motor car’ can be disambiguated, and will probably merit its own sense 
paragraph in the revised edition. Although there is no reason why Welsh car should not 
have followed the same semantic development as English car, the development of the 
meaning ‘motor car’ must have been heavily influenced by English (and indeed, the ear- 
liest Welsh attestation in this sense occurs in the collocation car modur, which is almost 
certainly a calque on English motor car). The lexicographer must decide whether there 
are sufficient unambiguous quotations to form separate sense paragraphs, and what to 
do with the ambiguous quotations, especially if they contain the earliest attestations. 
Each dictionary team will have different guidelines as to what is acceptable. Historical 
dictionaries are occasionally criticized for their lack of semantic precision, but it is diffi- 
cult to see how that precision could be enhanced significantly without having to create a 
sense paragraph defined as ‘ambiguous’ or ‘of obscure meaning’ for very many words. To 
simply ignore a large proportion of the evidence, including, perhaps, the earliest attesta- 
tions, would be intellectually dishonest and misleading to the dictionary user. 

It is usually easier to distinguish functions rather than senses, thus the senses of a 
verb which is used both transitively and intransitively, for example, might be sepa- 
rated according to whether they are transitive or intransitive, with subsenses arranged 
accordingly. Thus the Welsh verb pallu is defined in two senses, intransitive (‘to fail, run 
out, be lacking, weaken, become dim (of sight), be destroyed, die, perish, cease; refuse, 
deny’) and transitive (‘to refuse, deny, fail; (cause to) fail or cease’). Similarly, figurative 
senses can often be distinguished from literal ones, although ambiguities of definition 
can arise where a literal meaning occurs in an extended figurative context (see Section 
11.7 below). 


11.5 DEFINING OBSOLETE OR OBSOLESCENT 
WORDS AND MEANINGS 


Osselton (1995: 46-52) has described the challenges posed by obsolete or obsolescent 
words of various kinds in dictionaries of living languages, suggesting that they fall into 
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a number of categories including: ‘obsolete terms’ such as hoker (‘disdain’); ‘histori- 
cal terms’ such as villein; ‘discontinued terms’ such as shilling; ‘obsolescent terms’ such 
as wireless {in the sense of audio radio broadcasting); ‘archaisms’ such as aviator; ‘poetic 
archaisms’ such as welkin (‘sky’); ‘formal archaisms’ such as mien or to enterprise which 
are in ‘modern elevated use’; ‘jocular archaisms such as trencherman ‘a person with a 
good appetite’; as well as archaisms or obsolete terms in proverbs and sayings, such as 
glisters in ‘all that glisters is not gold. A large proportion of the words in any historical dic- 
tionary fal] into one or more of these categories and present particular problems of label- 
ling to the lexicographer. In the case of dictionaries of living languages, such labelling 
may be overtaken by developments within the language in question. For instance, since 
Osselton delivered the lecture quoted above in 1978, the word wireless has had an enor- 
mous revival in use in the field of computing to refer to technologies such as ‘Wi-Fi’ and 
‘Bluetooth: Rapid changes of this kind affect historical dictionaries in particular because 
such works may take many decades to complete or revise. (See further Section 11.7.) 

A similar situation arises in the case of words which may still be in current usage but 
whose meaning has developed over time, perhaps as a result of scientific discoveries, as 
in the case of the noun planet. OED3 handles this by defining three major senses: 


1. a. Each of the seven major celestial objects visible from the earth which move 
independently of the fixed stars and were believed to revolve around the earth 
in concentric spheres centred on the earth (in order of their supposed distance 
from the earth in the Ptolemaic system: the moon, Mercury, Venus, the sun, 
Mars, Jupiter, and Saturn). Now hist. 

b. Chiefly Astrol. and Alchemy. Any of these bodies (in modern use sometimes 
also including Uranus, Neptune, and Pluto) regarded in terms of its supposed 
influence or quality in affecting persons, events, and natural phenomena. 
Hence in later use: a controlling or fateful power, usually of an occult nature. 


3. a. Astron. Any of various rocky or gaseous bodies that revolve in approximately 
elliptical orbits around the sun and are visible by its reflected light; esp. each 
of the planets Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, Neptune, 
and (until 2006) Pluto (in order of increasing distance from the sun); a similar 
body revolving around another star. Also: any of various smaller bodies that 
revolve around these. 


(and adding a note on the recent ruling that Pluto is now considered a ‘dwarf planet’ 
rather than a planet proper).'° Note the use of the past tense in the first definition and 
the label “Now hist? and the use of the qualifier ‘in modern use’ in the second. Note also 
the use of the labels Astrol. and Alchemy’ and Astron! 


1 OED3 planet n., online version March 2006, <http://www.oed,com/view/Entry/145058>; accessed 
12 April 2015. 
® See further Section 11.7 on ‘Labels or tags. 
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11.6 DEFINING GUIDELINES 


Each historical dictionary project will have guidelines stipulating how inclusive the dic- 
tionary’s definitions should be. Some may adopt an inclusive policy which aims at pro- 
viding a definition or sense equivalent for every occurrence of a word in every quotation 
cited. At the other extreme, some may provide indicative definitions or synonyms and 
supply sufficient quotations to give additional contextual or semantic information, per- 
mitting the quotations themselves to add precision to the definitions. 

Silva (2000: 84-5) has described the OED’s practice in defining various grammatical 
classes of words, which developed over the early years of the project and have subse- 
quently been followed by many other dictionaries. The development ofa consistent style 
of definition aids both the lexicographer and the dictionary user, although this formu- 
laic style of definition may prove rather impenetrable to some users. 

OED's former Editor-in-Chief, John Simpson, has some pertinent and practical 
advice in this respect which it is worth quoting in full: 


1) Use standard modern English vocabulary and idioms—be neutral (if anything, 
slightly conservative) and not colloquial 

2) Some aspects of a word are central to a definition, others are relevant but not cen- 
tral, and yet others are peripheral. You should not try to include every possible 
facet of a term in a definition 

3) Beware of creating lists (especially those punctuated with ‘etcetera’ and ‘and the 
like’), as these lead to a scattershot approach to definition; make words fight for a 
place in your definition 

4) If you are not certain whether to add some feature to a definition, it is usually best 
to leave it out 

5) Write the definition for the user, who can be assumed (in a historical dictionary) 
to have a general knowledge of the language used in definitions, and not to need 
absolutely everything spelled out 

6) Don't define solely by context, as this means you will end up with many more sub- 
senses than you need (this is the ‘lumping’ versus ‘splitting’ argument) 

7) Ifyou are revising an older definition, do not assume that ‘old’ is wrong and ‘new’ 
is right; avoid the wilful destruction of the old if it is still the best way to define a 
word 

8) Once you have written a definition, read it again several times to check that it 
flows, in terms of style, and aligns with the facts that you have, and then make it 
slightly shorter. (Simpson 2008: 127-8) 


To these could be added the apparently obvious, but occasionally overlooked, principle 
that the definition and synonyms (if any) should be able to replace the definiendum both 
grammatically and semantically (although see Hanks, this volume, on the difficulties of 
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this approach for some areas of vocabulary). Some dictionary guidelines also stipulate that 
at least one of the definitions or synonyms in a sense paragraph should be able to replace 
the word defined in every quotation, although some dictionaries permit greater latitude by 
defining the most important meanings only and leaving some quotations to stand by them- 
selves, They may be self-defining, such as an entry from another dictionary, or they may 
refer to a specific meaning. Quotations can be used effectively in this way to provide extra 
information about the context in which a word is used (e.g, only in poetry, only in prose, 
mainly in a religious context, referring only to inanimate objects), thereby freeing the lexi- 
cographer from the need to be over-specific in the definition. Kuhn (1980), however, points 
out wryly that a definition must be sufficiently specific to avoid being meaningless. 


11.7 LABELS OR TAGS 


Most historical dictionaries, like many smaller or synchronic dictionaries, employ 
usage labels, often abbreviated, to refer to a specific domain, such as ‘Physics’ ‘Medicine, 
‘Carpentry, ‘Quarrying; etc. Other labels refer to the usage of a term (such as ‘Obsolete, 
‘Informal, ‘Historical, ‘Colloquial, “Vulgar, etc.) as a guide to currency and register. 
Some may refer to grammatical function (‘In plural} “Transitive, ‘As a noun, etc.). Such 
tags can be employed usefully to shorten definitions or to indicate the context of an oth- 
erwise ambiguous term. 

Labelling is discussed separately in Brewer (this volume), but as it is intimately linked 
with defining and quotation evidence, it requires some attention here. Verkuyl et al. 
(2003) discuss labelling in synchronic dictionaries, which they consider to be mainly 
concerned with decisions which the dictionary user needs to make in the production 
of language, rather than its interpretation. They distinguish between ‘group labels’ 
which ‘indicate [a] word as belonging to a group of speakers’ and ‘register labels’ which 
‘guide [a] user in choosing between alternatives. They further subdivide ‘group labels’ 
into ‘geographical; ‘temporal; ‘professional, and ‘social domains. Svensén (2009: 315- 
32) likewise discusses labelling (which he calls ‘marking’) in synchronic and especially 
bilingual dictionaries and attempts an even finer distinction between types of labelling. 

The ‘Now hist? label in the OED3 planet example quoted in Section 11.5 demon- 
strates one major difference for historical dictionaries: the diachronic dimension. This 
label, however imprecisely, shows how the meaning has developed from general use 
to a restricted historical sense. There are often big shifts over time, especially in regis- 
ter: what is considered highly technical in one period becoming part of everyday lan- 
guage in another, or what is considered markedly informal in one period becoming 
acceptable even in formal discourse in another. The example of appendicitis in relation 
to the original OED is well known (Mugglestone 2005: 140). The word was not included in 
the relevant part (Ant-Batten), which was published in 1885. When Edward VII’s corona- 
tion was delayed as a result of the condition, the word became very well known, and Murray 
was subsequently criticized for not having included it. 
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Some words present particular difficulties, such as bloody in OED3 at sense 8a, reflect- 
ing not only change in register over time but also spread between different varieties of 
English: 


This word has long had taboo status, and for many speakers constituted the strongest 
expletive available... .. In most contexts the word's taboo status has now been largely or 
entirely lost; the process of normalization seems to have begun earliest in Australia. 

Following the original use in England, Scotland, and Ireland, the sense spread 
to most other parts of the English-speaking world, with the notable exception of 
the United States, where it has apparently only ever achieved limited currency, e.g. 
among sailors during the 19th cent. (OED3) 


Some labels, such as the ‘Alchemy’ label in the OED3 planet example quoted in 
Section 11.5, immediately flag that a sense belongs to an area of discourse that belongs 
wholly or mostly to the past, although this is not stated explicitly. Some dictionaries use 
labelling only where it is necessary to differentiate otherwise ambiguous definitions, such 
as ‘division (in Math.); ‘vice (in Carp.); or ‘Eccl. living: They are used to abbreviate defini- 
tions, rather than in an attempt to differentiate all senses that belong to a specific domain. 


11.8 FIGURATIVE AND TRANSFERRED USAGE 


A special use of labelling is to denote the figurative use of particular words: the use of 
the word in any sense other than its literal or concrete sense. Most words can be used 
figuratively, especially in poetic language, and this can complicate the process of defin- 
ing. Words may even be borrowed from another language in a figurative sense first, and 
then, ata later date, be borrowed in the original literal sense. 

According to the OED, the English noun flock originally meant (1) ‘A band, body, or 
company (of persons); as recorded in Old English, and only later developed the modern 
meaning (2) ‘A number of animals of one kind, feeding or travelling in company. Now 
chiefly applied to an assemblage of birds (esp. geese) or... of sheep or goats, and later 
still (3) ‘A number of domestic animals (chiefly, and now exclusively, of sheep or goats) 
kept together under the charge of one or more persons. It was from this last sense that 
the religious metaphor developed: (4) ‘In spiritual sense, of a body or the whole body of 
Christians, in relation to Christ as the “Chief Shepherd’, or of a congregation in relation 
to its pastor’? The following quotation is a figurative use (4, above) of sense (3), rather 
than sense (1): 


1588 J. Udall Demonstr. Trueth of Discipline (Arb.) 26 The minister is a shepheard, and 
his charge a flocke. 


2 These definitions and the accompanying quotations are from OED1 flock n’. 
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Of course, this particular figure follows the figurative use of the equivalent Greek words 
in the New Testament, adding a further level of complexity. 

A further complication can arise where a word is used in an extended figurative con- 
text, where a word's sense may appear to be literal, but it occurs in a context which itself 
is figurative, as for instance in this Welsh example: 


1731 T. Lewys: Bywyd a Marwolaeth yr Annuwiol 230, Nid oedd hyn ond ysgathru 
ymmaith Ganghennau pechod. [trans. “This was but the cutting away of the branches of 
sin. ] 


Transferred usage generally refers to the transfer of the sense of a word or expression 
to another concrete object, such as in the following example where flock is used of inani- 
mate objects, metaphorically for sense (2): 


a1807 Wordsworth Prelude (1959) iii. 72, Lamps, Gateways, Flocks of Churches, Courts 
and Towers. 


The Oxford Dictionary of English (ODE) defines the verb transfer, usually in the adjecti- 
val form transferred thus: ‘change (the sense of a word or phrase) by extension or meta- 
phor’ GPC originally distinguished between ‘figurative’ and ‘transferred’. Some senses 
were labelled as ‘figurative’ or ‘transferred’ or occasionally both, or alternatively labelled 
as ‘in a figurative context. However, because of the difficulty of applying this labelling 
consistently, GPC has abandoned the regular use of ‘transferred; as has OED3. Different 
dictionaries employ these labels in different ways. The distinction between the terms 
‘figurative’ and ‘transferred’ in OED1 is not always obvious, and Donna Lee Berg (1993: 
183) points out that “Murray may have originally thought of transferred in more general 
terms as a superordinate category of which non-literal applications such as figurative, 
allusive, etc. were subordinate categories, a rather different usage compared to GPC. 
Allusive use is a kind of figurative use where an allusion is made to some well-known 
source, often the Bible, or usage of a word or expression, such as the combination bread 
and circuses which OED2 notes is a reference to ‘Juv[enal’s] Sat[urnalia] x. 80 Duas tan- 
tum res anxius optat, Panem et circenses ... used allusively for food and entertainment, 
esp. when provided by a government to assuage the populace’ followed by this quotation: 


1967 Listener 16 Mar. 373/1, The almost Roman outpouring of stale bread and dull cir- 
cuses in what is supposed to be the more ‘entertaining’ side of television.° 


The expression ‘with reference to . . , or its equivalent, is often used in historical diction- 
aries in such circumstances. 


5 OEDa, bread, n., online version March 2012, <http://www.oed.com/view/Entry/22888>; accessed 28 
April 2012. 
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Generally, usage labels seem to be used more as an aid to refining definitions than in 
an attempt to impose some sort of ontological system on the lexicon. OED Online, how- 
ever, has gone further by cross-linking the OED proper and the Historical Thesaurus of 
the Oxford English Dictionary (HTOED). 


11.9 READING PROGRAMMES 
AND QUOTATION SLIPS 


OPT EOOOT ET POLE TTT TTT TUTTO TC SOTTTNOTOTSTTUTVeLTTrerv iver iriver re ver rerever reves rrirerireervecr eerie trrtiirereiereinirireiiite ieee ay 


Most historical dictionaries have made extensive use of a reading programme, that is a 
programme using voluntary or paid readers to examine a collection of specific texts and 
record words in their context in the form of quotation slips (or their electronic equiva- 
lent). Even before James Murray began work on the OED, the Philological Society had 
arranged a volunteer reading programme for the Dictionary. The quotation slip will 
normally include the headword and the date of the example together with a phrase or 
sentence containing the word in its context and a reference to the source, usually in an 
abbreviated form. Such slips are then sorted into alphabetical order and often also into 
chronological order within that sequence. This collection forms the raw material from 
which the dictionary is edited. 

To some extent, reading programmes have been supplanted by electronic corpora 
(see Brinton, this volume), although many historical dictionary projects still make 
use of reading programmes, including the OED which has four separate reading pro- 
grammes in operation, embracing current, scientific, historical, and country-specific 
programmes (and today mostly generating electronic records rather than the traditional 
paper slips). Reading programmes are especially useful for collecting unusual or rare 
words as it is easier for readers to spot unfamiliar words or unusual usages of words than 
it is for them to record a representative sample of the vocabulary. Computer-generated 
corpora, on the other hand, are all-inclusive, and for languages where the known corpus 
of texts is comparatively small, such as in the case of Old English, they can be extremely 
useful because of their very inclusivity. 

Traditional quotation slips have many disadvantages, including the space required 
for their storage, the fact that the material stored on them is only accessible via the head- 
words recorded on them, their susceptibility to loss by fire, and the fact that only a cer- 
tain amount of data can be recorded on them. However they do have the advantage of 
being easy to distribute into groups (e.g. according to various senses), or to sort into 
chronological order, or even, as in the case of the original OED, to mark up and send 
directly to the compositors for printing. Slips may also contain non-textual material, 
such as photographs, diagrams, or sketches."* 


‘4 The GPC slips contain at least one example of a pressed plant. 
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11.10 USE OF WRITTEN SOURCE MATERIAL 


11.10.1 Manuscript Sources 


The OED and many other historical dictionaries have tended to avoid the use of hand- 
written sources for obvious reasons: such sources are unedited and therefore need to 
be transcribed, and can be difficult to reproduce accurately in print. They are generally 
less accessible to dictionary users than printed sources, although online access to digi- 
tal copies is rapidly removing that disadvantage. The Early English Text Society (EETS) 
was established by F. J. Furnivall expressly in order to provide edited texts for the use of 
the OED editors (see Benzie 1983: 117ff.). This obviated the need for the editors to resort 
to the original manuscripts or inadequate editions of texts which they needed to cite. 
When the editing of GPC began in 1947, the state of Welsh studies was such that much 
recourse to original manuscripts was necessary. Facsimiles (either typographical or pho- 
tographic) were used where possible in preference to the original manuscripts in order 
to provide easier access for readers to the texts. During the course of editing the diction- 
ary, many works have subsequently been edited to a high standard, and less dependence 
is now necessary on manuscript sources, In recent years an increasing number of manu- 
scripts have become available in online transcriptions, often with accompanying manu- 
script images, and the GPC editors have begun to cite more extensively from these. 


11.10.2 Printed Material 


Textual material can occur in many printed forms, such as books, pamphlets, newspa- 
pers, magazines, and journals, and also in the form of printed ephemera such as theatre 
programmes, tickets, handbills, and posters. Traditionally, historical dictionaries have 
tended to place more emphasis upon printed literary texts as a source of quotations, and 
have tended to deprecate newspapers, magazines, and other less permanent publica- 
tions. Charlotte Brewer has shown how the early OED lexicographers were heavily influ- 
enced by the social values of the time, giving precedence to the perceived ‘best’ authors 
in deciding which quotations to include in the dictionary (see also Considine 2009), 
citing comparatively little evidence from contemporary newspapers (although Murray 
clearly recognized their value (Brewer 20074: 117-18, 234-6)), and paying insufficient 
attention to the eighteenth century’ and women authors, As precedence was also given 
to acknowledged authors in the reading programme, it is hardly surprising that the quo- 
tations in the first edition of the OED heavily over-represented well-known authors. 


5 See, e.g., ‘Rhyddiaith Gymraeg 1350-1425 / Welsh Prose 1350-1425’ at <http://www. 
rhyddiaithganoloesol.caerdydd.ac.uk/>. 

6 On the eighteenth-century evidence see Brewer (20072: 128-9) and especially (2005b—: <http://oed. 
hertford. ox.ac.uk/main/content/view/93/237/>). 
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This tendency was not limited to English dictionaries alone. The first edition of 
the Anglo-Norman Dictionary (AND) was originally intended more as a glossary of 
non-standard French used in England, and as such almost entirely ignored literary texts, 
whilst the second edition, which is now in progress, cites extensively from literary texts 
but also draws heavily on the substantial documentary evidence. The digital AND is 
interesting for the way in which it has been designed to integrate with a set of digitized 
texts, and to permit other dictionaries to link directly to it by using immutable refer- 
ences for each headword (Trotter 2011). 


11.10.3 Electronic or Digitized Resources 


Increasingly, historical dictionaries are using online evidence from the World Wide 
Web, occasionally citing electronic resources such as electronic editions of texts, as 
well as ordinary web pages, blogs, and web corpora. There are two main problems with 
material of this kind: the fact that it is often ephemeral and the related problem of how 
to cite it within the dictionary. Web resources often change location and can disappear 
altogether in a relatively short time. One solution to this problem is to use abbrevia- 
tions to refer to the resource, giving the URL” only in the bibliography, which can then 
be updated as necessary. This method is perhaps better suited to printed dictionaries 
than to online dictionaries, where hyperlinked URLs would be more useful to the users, 
but is of course of no use if the resource disappears completely from the Web. Another 
method which has been suggested is to create a digital repository of all the resources 
used in the dictionary to which reference would otherwise be difficult, and then to place 
references in the dictionary to the repository. In this way, it would be possible, for exam- 
ple, to incorporate an image of some ephemeral item, such as a playbill, and to reference 
it uniquely from within the dictionary. This method has much to recommend it, par- 
ticularly for online dictionaries. 

Very little in the way of printed ephemera has been digitized so far, with the notable 
exception of the John Johnson Collection of Printed Ephemera at the Bodleian Library, 
Oxford.” 

The availability of collections of high-quality searchable digitized copies of printed 
books, such as those available under the EEBO” and ECCO”? schemes, has transformed 
the way lexicographers of English work. (See also Brinton, this volume.) Although 
ostensibly coliections of English-language texts, these two resources also contain many 


Y The universal resource locator (URL) is usually a reference to the location of a resource in the 
filestore of a particular web server, often referred to as its ‘address’ or “Web address; although this may be 
a virtual address which redirects the user’s browser to another location. Virtual addresses are sometimes 
used in this way to provide so-called ‘persistent addresses: 

8 See <http://www.bodleian.ox.ac.uk/johnson>. 

9 Early English Books Online, see <http://eebo.chadwyck.com/home>. 

20 Eighteenth Century Collections Online, see <http://gale.cengage.co.uk/product-highlights/history/ 
eighteenth-century-collections-online.aspx>. 
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volumes in other languages, including hundreds in Welsh. They form an invaluable 
resource for both the OED and the GPC editors, enabling them to search for further 
examples of a particular wordform. Both collections have recently been combined 
together with the British Library’s collection of digitized nineteenth-century books, in 
a resource called ‘JISC Historic Books.”! An additional advantage for the historical lexi- 
cographer of having both the digitized text and the digitized image of the text is that the 
verification of quotations is very much quicker. Many other such resources are becom- 
ing available regularly, such as the British Library’s nineteenth-century Newspapers 
Project,” the National Library of Wales's ‘Welsh Journals Online; and ‘Welsh 
Newspapers Online,” anda host of other similar resources. Both ECCO and EEBO have 
benefited from additional collaborative projects to provide very high quality search- 
able texts based on XML that are the result of an extensive project by the Text Creation 
Partnership” to key in a very large representative sample of the collection. A useful, but 
occasionally frustrating, resource is Google Books, a collection of digitized books and 
journals from some of the world’s major libraries, which the search engine company 
Google has scanned and made available for searching online.”® The quality of the out- 
put of the automatic optical character recognition (OCR) software used to process these 
images can be fairly low, leading to many inaccuracies in searching the indexed text 
which can make searching difficult or impossible. Also the bibliographical metadata for 
the scanned books contain numerous errors, making accurate citation from them dif- 
ficult on occasions. However the sheer number of volumes now available has greatly 
enhanced the lexicographer’s ability to find additional examples of nineteenth-century 
texts in particular. Copyright restrictions prevent Google Books from displaying the 
full text of most twentieth-century and some late nineteenth-century publications, but 
Google encourages publishers to contribute searchable samples (or even the entire text) 
of their current publications, with the result that lexicographers have access to much 
contemporary material, 

In the case of Web-based digitized resources, the lexicographer is at the mercy of the 
search interface of the resource concerned. These can vary from the extremely pow- 
erful (such as that provided for the OED Online) to the barely usable (such as some 
digitized library collections, which do not even permit searching for a phrase). In many 
ways it is preferable to download, if possible, the full text of a book or article, so that 
it may be incorporated into a fully searchable collection under the lexicographers’ 
control. This is particularly true of highly inflected languages or languages with initial 
mutations, such as the Celtic family of languages, where the initial letter of a word can 


2l See <http://wwwjischistoricbooks.ac.uk/>. The collection includes over 370,000 books. 
22 See <http://wwwjisc-collections.ac.uk/Catalogue/Overview/index/32>. 

3 See <http://welshjournals.lgc.org.uk/>. 

24 See <http://welshnewspapers.]lgc.org.uk/en/home>. 

5 See <http://www.textcreationpartnership.org>. 

26 See <http://books.google.com/>. 
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change depending upon its grammatical context, making normal word-based search- 
ing cumbersome to use.”’ 

‘Stemming’ (where a search for the stem of a word generates a search for all word- 
forms beginning with that stem) is offered by some digitized collections, which is more 
useful for English than for the Celtic languages. 


11.11 SPECIFIC TYPES 
OF QUOTATION EVIDENCE 


11.11.1 Quotations from Other Languages 


Sometimes it is necessary to cite an example of a word which occurs in a different lan- 
guage context, for example an English word occurring in an Anglo-Norman document, 
or a Welsh word in one of the Latin law books. The OED uses square brackets to denote 
such material, while some dictionaries, such as GPC, cite the text asa normal quotation, 
although other dictionaries may choose to cite such material by comparing it at the end 
ofan entry, as is sometimes done in GPC. The former method has much to recommend 
it, as itis then possible to provide a date for the quotation, especially where the quotation 
is the earliest in that sense paragraph. Often a word may occur in a place-name ora per- 
sonal name earlier than it is recorded in a literary context. Such words are evidence for 
the existence of the word at an earlier date than the earliest literary attestation, but it can 
be difficult to know how to refer to such evidence. In Welsh, many common nouns are 
first attested in place-names in the twelfth-century Liber Landavensis, usually in lists of 
parish boundaries in a Latin context. Again, GPC gives these as ordinary quotations, but 
cites most other examples from place-names at the end of the relevant sense paragraph. 


11.11.2, Dictionary Evidence 


Dictionaries, both printed and manuscript, can provide evidence for the existence of a 
word or phrase at a particular date. However, there are a number of difficulties in citing 
evidence from dictionaries. The inclusion of a word in a dictionary is considered to be 
secondary evidence, as opposed to the occurrence of a word in a text, which would be 
considered to be primary evidence. Some (e.g. Osselton 1995: 137~47) have suggested 
that such secondary evidence should be treated differently from primary evidence, and 


27 ‘Regular expressions’ (a computer notation system for complex textual searches) can be very useful 
for searches of this kind. The GPC staff have found the program Filelocator Pro by Mythicsoft to be 
particularly suitable for performing complex searches using regular expressions across a large collection 
of texts in many different formats. 
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cited separately in the dictionary. The fact that a word occurs in a dictionary published 
in a particular year does not indicate that the word was current at that time, simply that 
the earlier lexicographer had knowledge of it at that time. 

Dictionaries, particularly early dictionaries, also contain many erroneous or nonce 
words, acquired perhaps by miscopying, or by misunderstanding another dictionary 
or text. It could be argued, therefore, that such entries should be treated separately. A 
further problem with dictionary entries is that they are often glossed ambiguously and 
it may be impossible to determine to which sense paragraph they should belong. The 
lexicographer should also be aware that a word used as a gloss may have changed its 
meaning, and that inexperienced dictionary users may misunderstand the meaning of 
a word by assuming that the gloss has its current meaning, rather than that given in the 
definition. This is a further argument for treating dictionary evidence differently from 
primary evidence. An alternative method of treating dictionary evidence is to label the 
senses derived from such evidence with such tags as ‘dict, ‘originally dict, or ‘mainly 
dict; the last two being used chiefly in cases where some writers have adopted words 
from dictionaries in their writing. 


11.11.3. Quotations as Evidence for Naturalization 


Quotations, simply by the form in which they occur, can be very revealing about the his- 
tory of loanwords. The following series of quotations illustrate well the process by which 
an English word is naturalized into Welsh. In the first example, the borrowed word is 
spelt using English orthography and inserted between quotation marks:”8 


1947 Y Fflam ii. 66, Dyma’r nofel gyffrous, fel y dywaid y ‘blurb’ y tu mewn ir clawr, 
a ddyfarnwyd yn orau yn Eisteddfod Genedlaethol Bangor. [trans.: “This is the excit- 
ing novel, as the blurb inside the cover says, that was chosen as the best in the Bangor 
National Eisteddfod.] 


This is essentially the English word in a Welsh context, and would normally not warrant 
inclusion in a Welsh dictionary. In the following example, the word is spelt using Welsh 
orthography but is still inserted within quotation marks: 


1958 Lieufer xiv. 139, Rhed y stori o fewn ffram syniadol a amlinellir yn eithaf taclus a 
chywir yn y ‘blyrb’ ar ei siaced Iwch. [trans.: “The story runs within a conceptual frame 
that is outlined quite neatly and correctly in the blurb on its dust jacket] 


Respelling has made it look like a Welsh word, but the writer still considers it to be 
non-standard. Finally, in the last example, the word appears to have been completely 


28 All the following examples are cited from GPC2 s.v. blyrb, online version: <http://geiriadur.ac.uk/ 
gpc/gpc.html?blyrb>. 
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naturalized: it has acquired a standard Welsh plural termination and quotation marks 
are no longer used: 


1965 Efr{ydiau] Athr{onyddol] xxviii. 55, I gael syniad cynhwysfawr oi feddwl [William 
Williams Pantycelyn], rhaid astudio...r[h]ai o'r blyrbiau doniol ar gyfer rhai oi lyfrau. 
[trans.: “To get a comprehensive idea of his thinking, the amusing blurbs for some of his 
books must be studied.] 


For a study of a similar process by which Japanese words are assimilated into English, 
as demonstrated by the quotation evidence from a number of British and American 
English dictionaries, see Kimura (2000). 


11.12 COLLOQUIAL, DIALECTAL, 
AND SLANG EVIDENCE 


Printed and manuscript sources are by no means the only evidence which can be used by 
the historical lexicographer. For living and recently defunct languages, such as Manx,” 
considerable colloquial evidence may exist in the form of transcriptions or recordings. 
For example, in the case of Welsh, the St Fagans National History Museum in Cardiff 
has an extensive collection of recordings of colloquial speech from various areas of 
Wales, recorded since the 1950s, the oldest speaker recorded having been born in 1858. 
Colloquial speech can also occur in transcriptions of speeches, lectures, sermons, etc., 
and to some extent in the form of film, television, or radio scripts. While such mate- 
rial has often been standardized in transcriptions, it can still form a useful source of 
information. An early initiative by the Guild of Graduates of the University of Wales 
led to the production of a number of lists of colloquial usage for specific geographical 
areas in Wales at the beginning of the twentieth century. A large number of postgradu- 
ate theses have been published on particular Welsh dialects, which form a very useful 
source for Welsh lexicographers, as does The Linguistic Geography of Wales (‘Thomas 
1973), a linguistic atlas of Welsh dialects which summarizes the results of a comprehen- 
sive questionnaire-based survey, similar to the Leeds Survey of English Dialects (Orton 
1962). In English, Joseph Wright's English Dialect Dictionary is a substantial and valuable 
source for English lexicographers which was published between 1898 and 1905, contem- 
poraneously with the OED. To some extent, this permitted the editors of the OED to pay 
less attention to dialectal words and usage than they might otherwise have done. This 
has broadly persisted into the revision of the OED: ‘regional English is typically only 
included if it is recorded in a reasonably wide geographical area, and is therefore known 


29 The last native speaker of Manx, Edward ‘Ned’ Maddrell (1877-1974), was recorded extensively by 
the Irish Folklore Commission. 
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to a considerable number of speakers’ (Simpson 2008: 118). For a discussion of Murray’s 
explanation of his understanding of the differences between colloquial and ‘common’ 
vocabulary and slang, see Berg (1993: 102). In general, GPC’s treatment of regional varia- 
tion is more extensive than that of the OED, although it is only a superficial survey of the 
available evidence, provided in the absence of a true dialect dictionary,” and it is by no 
means comprehensive. 

GPC uses the label ‘Ar lafar’ (= ‘colloquially/in spoken language’) to denote collo- 
quial or dialectal usage without differentiation. The evidence for such colloquial usage 
may have been published a hundred or more years ago, and reflects the speech of (often 
elderly) speakers of a generation or two earlier, who could have been born up to two 
hundred years ago. Alternatively the evidence may be as recent as a quotation from a 
blog (a self-authored piece published on the Web). Clearly, greater distinction is needed 
between these various usages. Wherever the term ‘colloquial’ is used, consideration 
should be given to dating the quotation as well as noting the location (if the area in 
which it is used appears to be geographically restricted). It may be preferable to cite from 
early dialectal studies as ordinary dated quotations, and reserve the label ‘colloquial’ for 
words which are believed to be still current at the time of editing. 

Much of the evidence for slang vocabulary of this kind is colloquial in nature and 
tends to appear in historical dictionaries on the basis of printed secondary sources, such 
as wordlists, vocabularies, or specialist dictionaries. (Compare Coleman, this volume.) 
In some cases a cross-reference to a specialist dictionary may be given rather than a 
quotation. As it is often difficult to date specific colloquial examples, such material is 
difficult to use for dating purposes, although in some cases, recorded colloquial use may 
predate literary evidence. 


11.13 VERIFICATION OF QUOTATIONS 


Because of the importance of quotations as evidence for the senses defined in the entry, 
it is essential that they are cited as correctly as possible. This usually means that all quo- 
tations should be verified against the original source. This is relatively easy with elec- 
tronic sources (provided the source can be found), but it becomes increasingly difficult 
where rare books or inaccessible manuscripts are concerned. Large dictionary projects 
such as the OED may employ specialist staff in some of the major libraries whose pri- 
mary duties are to do library research upon request from the editors and to verify quota- 
tions. Most projects, however, have to rely on the editorial staff or clerical assistants to 
verify quotations. 

Quotations may contain printing or scribal errors which need to be corrected. A label- 
ling method such as placing ‘[sic]’ after an erroneous form shows the reader the original 


30 An online dictionary of the dialects of Welsh is in preparation by Peter Wynn Thomas of Cardiff 
University. 
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form but indicates that the lexicographer believes it to be erroneous. This has the dis- 
advantage, however, of not necessarily indicating what the lexicographer considers to 
be the correct form. An alternative is to cite the corrected text with an abbreviation to 
indicate that it has been revised, such as ‘[leg.]: This method has the disadvantage of not 
citing the original text and places more reliance upon the lexicographer’s judgement. 
A combination of both methods is possible, where the original text is given together 
with an editorial emendation, e.g. ‘A torel [read towel]. Some texts, especially manu- 
scripts, may suffer from holes or torn pages, and such lacunae need to be indicated in 
quotations. Often quotations may be too lengthy to include in full, or digress unneces- 
sarily, and it may be necessary to mark an ellipsis in the text using *..° 

One of the difficulties of referring to early printed material can be that pages are either 
unpaginated or incorrectly paginated. In order to refer to such material, the page num- 
bering has to be inferred, and inserted within square brackets. Some works may contain 
several sections, with the numbering restarting at the beginning of each new section. 
Each dictionary will have a policy on such matters, permitting the source to be cited 
without confusion, but which may require a note in the dictionary’s bibliography refer- 
ring to a pagination problem in a particular text, or it may be necessary to refer to each 
section explicitly. Where pagination is either absent or particularly erroneous, it may be 
possible to use the printers’ signatures” that occur in most earlier printed works. 

Manuscripts often contain more than one set of page or folio numbering, particularly 
where several manuscripts have been bound together. It is advisable to use the latest 
pagination in the manuscript, often provided by the holding institution, and used by it 
in catalogues and handlists. The dictionary’s policy will determine whether to use ‘a and 
‘p; ‘’ and ‘v; or some other method to refer to the recto and verso sides of manuscripts’ 
folia when a manuscript is foliated rather than paginated. 

In cases where the work has been translated from another language, it may be use- 
ful to the reader to be given the original word or phrase in the other language. Many 
seventeenth- and eighteenth-century works (especially religious works) were translated 
into Welsh from English and other languages, and GPC occasionally cites the original 
in parentheses after the word or phrase in question to assist the reader. GPC does not, 
however, give references to the source of the original, although this would be a useful 
addition to the dictionary’s bibliography. 


11.14 DATING OF QUOTATIONS 


Dating quotations can be problematic. Sometimes it is only possible to state to which 
century (or span of centuries) a particular quotation belongs, especially in the case of 


31 These are sequences of letters and occasionally numbers printed at the foot of recto pages indicating 
the binding order of the gatherings or folded sections of the book to the binders. Signatures themselves 
can also be erroneous. 
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undated manuscripts. Where dating is unsure, the date may be preceded by a ques- 
tion mark. In some cases, it may be necessary to use circa, post, or ante to indicate an 
approximate date. Manuscripts can cause particular problems with dating because often 
those that survive are copies of earlier originals, the date of which can be very difficult 
to ascertain. In the case of some languages, notably Old English, it can be almost impos- 
sible to date most texts precisely. Some dictionaries cite quotations from manuscripts 
according to their supposed date of composition, although this may be several centuries 
earlier than the cited text. Other dictionaries adhere rigidly to the date of writing or 
copying. Some employ both methods depending on the type of material cited. This is the 
case with GPC, for example, where edited poetry is usually dated by the supposed date 
of its original composition, whereas prose manuscripts which are cited from a particular 
manuscript are dated according to the date of the manuscript copy. This occurs mainly 
because there are far fewer prose manuscripts, and those which have survived are gen- 
erally closer to the original date of composition than much of the surviving poetry. A 
single poem may occur in several hundred different manuscripts in the case of the best- 
known poetry. 

This can, of course, lead to inconsistencies, but the problem is mostly limited to those 
languages which have a large proportion of literature in unedited manuscript sources 
only. As more and more of these texts are edited, the problem diminishes. 


11.145 ARRANGEMENT AND PRESENTATION 
OF QUOTATIONS 


The arrangement of quotations varies considerably from one dictionary to another. 
Most historical dictionaries follow the practice of the OED in listing relevant quotations 
after the definitions in each sense paragraph. Some, however, such as the Middle English 
Dictionary (MED), conflate a number of definitions into one or more paragraphs, with 
a letter to denote each sense, and then list all quotations for those definitions, distin- 
guished by the same letters. This has the advantage of saving some space in the printed 
work but is more difficult for the reader. The first and second editions of the OED listed 
the quotations for their collocations in this manner, whereas the third (online) edi- 
tion, freed from stringent space restrictions, now lists the relevant quotations after each 
collocation. 

In describing languages with little inflection, such as English, it may be sufficient to 
leave the wordform representing the headword unmarked in the quotation, but in more 
highly inflected languages, such as Welsh or German, marking the wordform will assist 
the reader. This may be done by italicizing the word, or by preceding it with *’ or some 
other symbol. Marking the word also has the advantage of specifying which of multiple 
wordforms (should there be more than one) it is intended to exemplify in the quotation. 
However, if italicization is used for this purpose, it may be undesirable to retain any 
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italicization from the original quotation. The original quotation may use italicized text 
to denote borrowings from another language, a fact which it might be useful to retain. 
For this reason, it may be preferable to use a symbol instead of italicization. 

Citing from other dictionaries may present some problems. In citing from dictionar- 
ies in the same source language as that of the historical dictionary, it should be possible 
to cite directly without having to provide a page or folio reference. However where the 
word occurs within the body of another entry, it will be necessary to state under which 
word the quotation is listed. Early dictionaries may not necessarily be in strict alpha- 
betical order, with related words grouped together, or words with the same prefix treated 
together, and it may be necessary to provide additional information in order to permit 
the reader to locate the quotation. 

During the thorough revision of a historical dictionary, it is desirable to re-verify quo- 
tations, to trace quotations from printed sources back to the earliest printing, or at least 
to determine whether a newer or more accurate edition can be cited. This is particu- 
larly true of unpublished texts in manuscript form. Consideration should also be given 
to citing from online sources, especially in online historical dictionaries, perhaps with 
a hyperlink to the source, if it is believed that the source will remain available online 
at that address for the foreseeable future. The team revising the OED has been system- 
atically revising the quotations from certain key works which are cited extensively in 
the dictionary, tracing them back to the first printing wherever possible. For example, a 
recent update reported ‘the completion of work converting the 6,o00+ quotations in the 
OED from Edmund Spenser's Fairie Queene from the various editions cited in the first 
edition of the dictionary to the first printing of this seminal work.” 

There is some disagreement over how to cite from works whose orthography has been 
standardized during editing. For example, in Welsh, it is standard practice to edit poetry 
according to the standard modern orthography and rules of punctuation, whereas 
prose texts generally retain their original orthography. For this reason, GPC has tended 
to quote medieval poetry from certain well known early manuscripts, where possible, 
rather than from more modern editions of those texts. This assists those scholars who 
require access to the original text, but it is less useful to the general reader who would 
be deprived of the standardized spelling and the critical apparatus in the edition. Every 
dictionary will have to determine its own policy in this respect. An online dictionary, 
however, could usefully show both versions, or offer the reader a choice of which to dis- 
play, although this would double the burden of verification. 

Some historical dictionaries provide translations of their quotations. For example, the 
Dictionary of the Irish Language (DIL), a dictionary of Old and Middle Irish, provides 
translations of many of its quotations. Such a practice is useful for readers, particularly 
in the case of dead languages, but it can place an undue burden on the lexicographer 
in trying to provide translations of all quotations, particularly when citing unedited 
texts. Copyright considerations permitting, an online dictionary could usefully provide 


*® See the report by John Simpson at <http://www.oed.com/public/simpson1211>, accessed 5 
June 2012. 
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translations of some quotations from reliable editions which provide such translations, 
perhaps as an option. 


11.16 SELECTION OF QUOTATIONS 


The intelligent selection of quotations is normally essential in order to keep the size of 
the dictionary within reasonable bounds. Most historical dictionaries will cite the earli- 
est attestation of a word, no matter where it occurs, even though this may contradict 
other editorial policies. Within each sense paragraph, quotations are chosen to illustrate 
a set of related senses. Quotations will be chosen depending on a number of criteria. As 
Simpson remarks: 


Editors are instructed to select from the wealth of illustrative documentation. This 
selection is based on many criteria: we seek to illustrate the variety of genres in which 
a term is used; the introduction of major variant spellings; geographical and chrono- 
logical spread; quotations which add (through their text) some historical semantic 
information which cannot be included in the definition; etc. (Simpson 2008: 121) 


Some dictionaries, GPC amongst them until very recently, adopt the practice of giving 
simply a year of first attestation rather than a full quotation in cases where it is difficult 
either to provide a full quotation or to verify it. This is not entirely acceptable, as it leaves 
the reader wondering what the quotation may be and why it has not been provided in 
full. Ideally the dictionary should be able to cite any text, but practical considerations 
may make this impossible. An alternative, especially for an online dictionary, would be 
to cite the text fully and mark it as unverified, so as to warn the reader that it may be 
unreliable. 

Editors may have at their disposal hundreds or thousands of quotations which could 
be used to illustrate the use and meaning of a single word. With access to modern cor- 
pora, they may indeed have millions of examples of some common words, even in a 
lesser-used language such as Welsh.*? Even if it were possible to present all this evi- 
dence to the reader it would, of course, be totally overwhelming. It is vital therefore 
that a policy exists which guides the editors in their selection of appropriate quotations. 
Sheidlower (2011) has discussed in considerable detail the OED’s methods of selecting 
quotations from the vast collection available to the editors, General historical dictionar- 
ies of major languages, such as the OED, will often have a large number of quotations 
available for any given sense, whereas period dictionaries and dictionaries of lesser-used 
languages, such as GPC, will have a much more restricted collection. 


33 Ina web corpus developed for the editors of GPC, the commonest form (yn, which in fact 
represents several different words) occurs over 5 million times. 
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All dictionaries will have guidelines on the number of quotations to be cited within a 
sense paragraph in order to keep the dictionary within reasonable bounds, and so as not 
to overburden the reader with too many quotations. The policy will vary according to the 
type of dictionary, whereby a shorter historical dictionary may include only two or three 
examples per sense, while a full historical dictionary may include 10, 20, or 30 quotations 
per sense, or even more. Online dictionaries may have the benefit of being able to sup- 
press the display of quotations in order to make the structure of the entry clearer, and in 
such a case it may be possible to include a larger number of quotations than would nor- 
mally appear in a printed dictionary. Similarly, whereas a print dictionary will probably 
have the quotations set in continuous type within a paragraph, an online dictionary may 
be able to display one quotation per line, which can be easier to read. The OED Online 
website lists each quotation on a separate line, with the option to hide the quotations 
within specific sense paragraphs or in all of them, replacing them with the date range for 
those quotations. The online MED offers the user a choice between a run-on paragraph of 
quotations (as in the print edition), line-by-line quotations, or no quotations. 

Many factors determine the selection of quotations within a sense paragraph. If there 
are only a few quotations available they may all be cited, but usually a lexicographer will 
take many, possibly competing, factors into consideration in deciding what to include. 
Each quotation must, of course, be placed in the semantically or grammatically correct 
sense paragraph (if there is more than one). This is not always easy, as some quotations 
may be hard to interpret sufficiently clearly or be too short to provide enough context to 
disambiguate the meaning. Entries in other dictionaries often fall into this category, as 
has been mentioned (Section 11.2, above). There may also be textual or palaeographical 
problems which may obscure the meaning or make the exact form of the word being 
illustrated uncertain. 

The editor will attempt to provide a variety of quotations from various periods, 
usually providing the earliest quotation even if it would not otherwise meet the usual 
criteria for inclusion, as historical dictionaries are usually expected to provide the earli- 
est attestation (see Sheidlower 2011: 200-3). By providing quotations from a number 
of periods, the user can determine the period when the word was used in a particular 
sense. One or two quotations could be chosen for each century of the word’s existence, 
as well as one of the latest attestations, in order to indicate the chronological extent of 
the particular sense. Sheidlower (2011: 203), however, warns against the temptation to 
use glossarial examples to postdate an obsolete term, which would be better treated by 
adding a label such as ‘Now hist. to the definition. 

A quotation may be chosen to exemplify a particular part of speech or a morphologi- 
cal form such as a plural form of a noun, or to show its gender, or, in the case of a verb, 
to show a variety of conjugated forms. Historical dictionaries may have widely differing 
policies on how thoroughly they provide quotations to illustrate such forms as well as 
spelling variants (see Sheidlower 2011: 211). 

Examples of figurative use may be cited to give the reader an idea of the scope of figu- 
rative use, especially if such uses are not explicitly stated, but rather a tag such as ‘also 
fig’ is used. 
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A ‘good’ quotation should preferably occur in a context that helps to define the word, 
or that adds to our knowledge of the word by providing extra information about its 
origin or use. A quotation might also be chosen because it contains some interesting 
historical or cultural information or because it is amusing or well known. Or it may 
be chosen precisely because it seems to illustrate ‘typical, ordinary use. As mentioned 
above, care needs to be taken when citing from other dictionaries, because the evi- 
dence contained in such works is essentially a secondary source. However, it can be 
useful to cite a translation from a foreign work, citing also the word or phrase in the 
foreign language which corresponds to that cited, because this can help to indicate a 
specific meaning. 

In the case of fairly common words, it can be tempting to cite the more unusual quo- 
tations, particularly as reading programmes often produce a disproportionately large 
number of such quotations, However, this may give the reader a distorted understand- 
ing of the normal use of the word and should be avoided. The editor should not feel com- 
pelled to illustrate every subsense listed in the definition, as this would increase the size 
of some entries to unmanageable proportions, and, as Sheidlower (2011: 206-8) points 
out, could mislead the reader to believe that some ‘marginal or esoteric’ uses are more 
significant than they really are. 

Some dictionaries may differentiate between literary quotations and colloquial 
speech, perhaps placing the latter in a separate section. This is particularly true in the 
case of languages where the literary language has diverged considerably from the col- 
loquial language, as in the case of Welsh. In fact, the colloquial speech illustrated may 
occur in a printed source, for example in an academic study of a dialect. Editors may 
append some apposite information at the end of a sense paragraph about the use of the 
word in question in place names or personal names, or perhaps in the title of a play ora 
television programme. 

Selecting suitable quotations for inclusion in a dictionary entry is a difficult and 
time-consuming task, and one which calls for great experience on the part of the editor. 
In the case of entries with few quotations, it may be tempting to include so large a pro- 
portion of them that the user is given the mistaken impression that the use of the word 
is far more common than is actually the case. In such cases, some dictionaries append 
explicit labels such as ‘rare’ or ‘obsolete’ for particular sense paragraphs or individual 
senses, although applying such labels consistently can be difficult. 

Sheidlower (2011: 208-10) lists the many, and understandable, reasons why editors 
may be tempted to include too many quotations, but he also points out the drawbacks 
of such an approach, which include the time and effort involved in verifying and proof- 
reading, and that it is best not to include them in the first place. Subsequent rounds of 
editing are likely to trim overlong quotations and to exclude those considered to be 
unnecessary or confusing. New quotations which have come to light may be added, 
or previously excluded evidence may be included on the grounds that it has a signifi- 
cant contribution to make, The end result should be a concise collection of carefully 
selected quotations which illustrate the core meaning defined in that particular quota- 
tion paragraph. 
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11.17 CONCLUSION 


Quotation evidence is a prerequisite of historical lexicography. The contents of a histori- 
cal dictionary are based principally on the available quotation evidence. The definitions 
of the senses are determined by the lexicographers’ interpretation of the quotations 
and some quotations are cited to supply evidence to further enhance the definitions. 
Defining senses and selecting quotations are therefore inextricably linked. 

Online dictionaries offer the historical] lexicographer many new possibilities for link- 
ing information which has hitherto had to be presented in a linear fashion. The tech- 
nology can solve a number of problems where the lexicographer had to choose which 
quotation evidence it would be most useful to cite for most readers: now it is possible to 
let the readers choose for themselves. Similarly where it was formerly impossible or very 
difficult to cite certain types of material, digital techniques open the possibility of creat- 
ing a digital archive to contain such material, making it possible for users to use it whilst 
also preserving the material into the future. The linking of quotations with online elec- 
tronic texts also greatly increases the potential usefulness of an online historical diction- 
ary. Such developments, however, come at a cost: lexicographers, and those who manage 
such projects, will have to decide not only what it is possible to provide, but what can be 
afforded in terms of both time and resources within the project's budget. 
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12.1 INTRODUCTION 


Ir is perhaps a truism to say that ‘changes in information technology have revolutional- 
ized the field [of lexicography]’ (Dollinger 2010a: 101), in respect to both the collection of 
data and the construction of dictionaries. The use of the electronic corpus has been the 
standard for over thirty years, with automatic searches replacing, or supplementing, cita- 
tion slips amassed through more traditional reading programmes (see Hawke, this vol- 
ume). The lexicographic corpus is ‘any collection of text in electronic form ... designed 
specifically for use in the creation of dictionaries’ (Atkins and Rundell 2008: 54) or the 
‘totality of the sources that are systematically excerpted for a certain dictionary (i.e. the 
dictionary basis)’ (Svensén 2009: 44).! The necessary features of a well-designed lexi- 
cographic corpus have been extensively documented (see, for example, Landau 2001: 
273-342; Sinclair 2003; Atkins and Rundell 2008: 53-96; Prinsloo 2009; Kupietz, this 
volume). Foremost are the size and ‘representativeness’ of the corpus. While lexico- 
graphic studies require ‘particularly large corpora (Landau 2001: 332), size is almost not 
an issue anymore (Atkins and Rundell 2008: 57) as more and more electronic resources 
have become available (either freely or through subscription), such as electronic books 
from Project Gutenberg and the Oxford Text Archive, electronic journals from Project 
MUSE, and a range of news, business, and legal publications from LexisNexis Academic, 
not to mention textual material from the Web itself. Commercial publishers have had 
the resources to compile vast corpora, such as the Bank of English (Collins Cobuild), 


’ Svensén (2009: 44) notes also that the term ‘corpus’ is used in lexicographic contexts to refer to the 
citation files compiled by excerpting segments of text. 
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with 650 million words, the Cambridge English Corpus, with over 1 billion words, and 
the Oxford English Corpus, with 2 billion words. The Oxford English Corpus, in fact, ‘is 
based mainly on material collected from pages on the World Wide Web’? The notion ofa 
‘representative corpus has come into question, now usually replaced with the concept of 
‘balanced’ corpus, that is, a corpus that includes a wide variety of text types exhibiting a 
range of topics, styles, and registers, that gives a proportionate place to both spoken and 
written language, that is constructed from text samples, in appropriate size and num- 
ber (rather than complete texts), and that includes different varieties (if appropriate). 
The corpus should provide a ‘genuine—and inclusive—snapshot of the language’ (Atkins 
and Rundell 2008: 56). 

For smaller, more specialized, or non-commercial dictionary projects that often 
rely on public funds, however, the compilation of a lexicographic corpus de novo may 
be impossible because of the cost, labour-intensive nature, and multi-year span of the 
activity. Moreover, because of changes in the field of linguistics, lexicographic projects 
are often seen as ‘database’ projects, with no gain in knowledge (Dollinger 2010: 101), 
thus making funding even more problematic. Historical dictionaries pose particu- 
larly severe problems in respect to data collection and dictionary construction. Even 
the well-funded Dictionary of Old English, for example, was first required to digitize all 
extant Old English texts, and in its twenty-five years of exhaustive work has published 
only eight of twenty-two letters. 

In this chapter, I examine whether—in place of constructing one’s own lexicographic 
corpus—it might be possible to use the many extant electronic text collections and cor- 
pora, which are increasing daily, in a responsible way to construct a dictionary. The 
focus here will be on historical dictionaries. The chapter will question whether, in the 
time of diminished resources and perhaps diminished importance placed on historical 
dictionaries both in the publishing world and in the academic (linguistic) world, the 
historical dictionary has become an unaffordable luxury or whether it might be possi- 
ble to use electronic resources to ‘make a virtue of necessity’ (Dollinger 2010: 102). This 
chapter will use the ongoing revision of the Dictionary of Canadianisms on Historical 
Principles (DCHP-1; Avis et al. 1967) currently underway asa case study. 

Before turning to this study in detail, I will briefly examine in general the historical 
text databases and historical corpora available to the lexicographer (of English). 


12.1.1 Historical Text Databases 


Commercial products such as Early English Books Online (EEBO) (including over 
125,000 titles published in the United Kingdom from 1473 to 1700) or Eighteenth Century 
Collections Online (including 136,291 titles published in the United Kingdom in the 
eighteenth century) provide invaluable text collections. A large percentage of the texts 


2 <http://oxforddictionaries.com/page/oeccompstructure> (accessed 19 June 2014). 
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of EEBO are now fully searchable,’ although allowing only basic searches. Of course, the 
historical texts freely available through such electronic archiving initiatives as Project 
Gutenberg or the Oxford Text Archive could be mined for the creation of lexicographic 
corpora if funds and labour were available. For example, these sources were used to cre- 
ate The Corpus of Late Modern English, version 3.0, a part-of-speech tagged corpus of 
British texts from a variety of genres dating from 1710 to 1920. But at 34 million words it 
is still small for dictionary-writing purposes. 

The most obvious sources for lexicographic data—both synchronically and dia- 
chronically—are the electronic archives of newspapers (Atkins and Rundell 2008: 61). 
However, in lexicographic practice, the use of newspaper collections, despite their easy 
availability,* has been criticized, primarily because such collections do not constitute 
a balanced or representative corpus. As Atkins and Rundell (2008: 62) note, newspa- 
pers are exclusively written texts that all belong to the same genre, ‘journalism, they are 
all from a specific, rather short time period, and they are limited in topic/subject mat- 
ter and style and register. However, some of these criticisms are not borne out. First, 
we now have extensive temporal ranges for many newspapers. The ProQuest Historical 
Newspapers collection includes forty-four newspapers covering the period 1764-2011; 
these are primarily US newspapers such as The Boston Globe, The New York Times, and 
The Washington Post, but also some international newspapers such as The Scotsman and 
The Times of India).> The Times (London) is available electronically from 1785 to 2009. 
PaperofRecord.com, ‘the world’s largest searchable archive of historical newspapers, 
includes a vast array of English and non-English language newspapers from around 
the world, many dating back to the mid- or late-nineteenth century. Early British news- 
papers (1661-1791) have been collected in the Zurich English Newspaper Corpus (ZEN) 
(1.6 million words), Second, many newspaper collections can and do include both large 
national papers and smaller local papers, thus leading to a range of subjects, styles, and 
registers as well as regional variation. The daily press in fact covers a great many subjects 
(Landau 2001: 331; Svensén 2009: 47).° Third, while entirely written, newspapers often 
include considerable quantities of recorded/represented speech.’ Finally, newspapers 


3 The Text Creation Partnership (TCP), an initiative begun in 1999 as a partnership between 
ProQuest, the University of Michigan, and partner institutions, is in the process of converting the 
original page images of EEBO into TEI-compliant SGML/XML texts, As of September 2011, EEBO-TCP 
has released 32,957 searchable full-texts (Rebecca A. Welzenbach, p.c.). See <http://eebo.chadwyck.com/ 
marketing/about.htm>. 

4 These archives are typically not freely available online, but must be accessed via subscription. 

5 Many institutions subscribe to only a subset of these newspapers. 

§ Landau (2001: 296), while rejecting the notion that 100 million words from the New York Times 
would constitute an acceptable corpus, admits that ‘given the variety of writing in a large newspaper 
such as The Times, even 100 million words from The Times would be better than the source material most 
dictionaries have had to work with in the past: Of course, newspapers archives are magnitudes larger 
than 100 million words. 

7 Mair (2006) observes that in the period 1960-90 newspapers show an increase in passages of direct 
speech, which he attributes to the ‘colloquialization’ of written English: “The intended stylistic effect is to 
make the texts appear more dramatic, interesting, and accessible’ (2006: 188). 
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can be seen asa highly dynamic medium and one that is much more linked to the spread 
of changes than other written media. 


12.1.2 Historical Corpora 


Until recently, balanced historical corpora of English have been much too small to serve 
lexicographic purposes. As Svensén (2009: 45) notes, ‘So far, existing corpora are hardly 
suitable as the only sources for historical-diachronic investigation, since in such con- 
texts they can only scratch the surface. The premier Helsinki Corpus of English Texts 
contains only 1.5 million words ranging from Old English to Early Modern English. 
ARCHER: A Representative Corpus of English Historical Registers 3.2, though covering 
a shorter time span (1650-1999), again includes only 2 million words of text. Other, 
more specialized historical corpora of English are even smaller (see Kyt6 2012 for a 
complete description of available historical corpora of English). Recently, however, the 
400-million-word Corpus of Historical American English (COHA), a balanced corpus 
covering the period 1810-2009, has become freely available on the web; this provides 
the first suitably large historical corpus of a variety of English for lexicographic pur- 
poses. The Proceedings of the Old Bailey Online, 1674 to 1913 is also now freely available. 
Although not a balanced corpus in any strict sense, it is ‘the largest body of texts detail- 
ing the lives of non-elite people ever published’ (<http://www.oldbaileyonline.org/ 
index.jsp>; accessed 16 November 2011). 

I now turn to my case study of a historical dictionary project in which pre-existing 
electronic text databases and corpora are being exploited in two ways: in the collec- 
tion of quotations illustrating the meanings of suspected new Canadianisms, and in 
the determination of the status of lexical items as either characteristic of or more fre- 
quently used in Canada, that is as ‘Canadianisms’ (see below, Section 12.2.4.2). I will also 
briefly exemplify the use of Google Advanced Search on the Web as a means to establish 
a word's status as a regional variant. 


12.2 DICTIONARY OF CANADIANISMS ON 
HISTORICAL PRINCIPLES 


DCHP-1 was one of the first historical dictionaries for a regional variety of English. 
Published in the year of Canada’s centennial (1967), it was a national success. However, 
although word slips for an updated dictionary were collected by Walter Avis and 
Matthew Scargill, no revision was ever undertaken. In 2006, Stefan Dollinger assumed 
directorship of the project (see Dollinger 2006), which has been ongoing at the English 
Department, University of British Columbia, Vancouver, since that time. 
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From the beginning, the intention was to carry out this project in a completely online 
environment, in part because of a lack of large-scale funding for this project® but also in 
order to move the project completely into the electronic age. One of the two aims of the 
project has already been achieved, that is, an online version of the original dictionary, The 
print edition was scanned, proofread, and corrected by a team of students, and mounted 
online using our own software. DCHP-1 Online (Dollinger et al. 2011) contains all of the 
text of the print edition as well as its original illustrations (see Dollinger 2010b: 251-2; 
Brinton et al. 2012). The online edition is now available free of charge (<http://dchp.ca/ 
DCHP-1/>), 


12.2.1 Second Edition 


Work on the second edition of the Dictionary Canadianisms on Historical Principles 
(DCHP-2; Dollinger et al. forthcoming) began in 2006 with data collection. Funding 
limitations prevented us from undertaking a large-scale reading programme nor did 
we have the staff or resources to construct a lexicographic corpus of historical Canadian 
English.? However, we could use the copious amounts of Canadian English in electronic 
form (see Section 12,2.2) and employ primarily student labour. 

Preparation of the second edition involves a number of stages: 


1. Identification of potential new Canadianisms 
Lists of Canadianisms from synchronic dictionaries of Canadian English (such 
as the ITP Nelson Canadian Dictionary [Friend et al. 1997], Canadian Oxford 
Dictionary [Barber 2004], or Collins Canadian Dictionary [Black et al. 2010]) are 
supplemented by our own searches for newly appearing words (e.g. grow rip) or 
older words omitted from the DCHP-1 (e.g. garburator). 

2. Collection of data 
Students collect data from Canadian (primarily electronic) sources (see Section 
12.2.2) on the potential new Canadianisms. Both students in undergraduate 
classes and those employed on the project have participated in data collection. 
To the extent that the sources permit, students are required to collect data from 
all ten provinces and three territories in ten-year or twenty-five-year time peri- 
ods (depending on the first appearance of the word) to the greatest possible time 
depth. 


8 We wish to acknowledge institutional support, especially from the University of British Columbia, 
Faculty of Arts and Department of English, and from the Social Sciences and Humanities Research 
Council of Canada (see <http://faculty.arts.ubc.ca/sdollinger/dchp2.htm>). 

* Only one corpus of historical Canadian English exists, namely Stefan Dollinger’s CONTE-pC 
(Corpus of Early Ontario English, pre-Confederation section 1776-1849). It includes diaries, letters, and 
local newspapers and totals 125,000 words, It is too limited in size and geographical/temporal range to 
serve the purposes of our project. 
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FIGURE 12.1 Sample entry in the Bank of Canadian English for loonie 


3. Recording of citations in an online database 

Data collected are entered directly into the online database, the Bank of Canadian 
English (BCE; Dollinger et al. 2006-).!° No paper slips are used. The BCE includes 
the citations from DCHP-1 (some 30,000 citations) and the newly collected cita- 
tions, for a total of 74,578 citations (as of 15 July 2015). Only a selection of these 
44,578 updated citations will be used in the DCHP-2: some words will not qualify 
as Canadianisms (see point 4), and in other cases the number of citations listed 
in the BCE is much larger than needed for a dictionary definition. A particularly 
well-researched word such as Canuck has 217 entries; even the less-thoroughly 
examined words have multiple citations: loonie has 53 citations, Red River cereal 
16, or creamo 22. Figure 12.1 shows one citation in the BCE for loonie, ‘a one- dollar 
coin depicting a loon, and by extension, Canadian currency or the value of the 
Canadian dollar’. 

Note that in contrast to the OED3 quotations database, the citations database 
for DCHP-2 was, from the beginning, intended to serve as a linguistic database 
of historical Canadian English (Dollinger 2006: section 3.2), serving a variety of 
purposes for linguistic research. This accounts for the large number of quotations, 


© For more detailed descriptions of the BCE, see Dollinger (2010a: 103-6, 2010b: 253-7). 
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and longer length of quotations than is normal for dictionary citations (Brinton et 
al. 2012). 

4. Determination of a word as a Canadianism 
Obviously, this involves wide consultation of print dictionaries of regional vari- 
eties of English. But, as will be shown in Section 12.2.2, once again electronic 
sources, especially of American and British newspapers, can be an invaluable 
source, especially in the case of very recent forms. 

5. The drafting of entries 


‘The creation of dictionary entries will make use of our online Dictionary Editing Tool 
(see Dollinger 20104: 110-11, 2010b: 258-59). This phase is well underway. For a full 
overview of the project (up to 2012), see Dollinger et al. (2012). 


12.2.2 Electronic Sources 


Of primary importance for the DCHP-2 have been historical newspaper archives. 
Canadian Newsstand provides full-text access to nearly 300 Canadian newspapers 
from all provinces and territories from 1977 to the present. These include both major 
urban daily newspapers as well as local weekly newspapers. Canada’s Heritage from 
1844—The Globe and Mail includes full digital images of ‘Canada’s national newspa- 
per’ from 1844 to the present, and Toronto Star-Pages of the Past likewise makes avail- 
able digitized copies of this newspaper from 1894 to 2010. Because of the difficulty of 
searching the full-image pdf (Portable Document Format) pages (with frequent false 
hits, especially in the earlier periods), Canadian Newsstand has served as the primary 
newspaper source. 

Digitized collections of student newspapers and university publications represent 
an important but sometimes overlooked resource. For example, the University of 
British Columbia Library Archives have digitized a number of university publications, 
including the student newspaper, the Ubyssey, from 1918 to 2012. This allows full-text 
search and can provide antedatings or other important information. For example, the 
1969 use of reading week in the Ubyssey (‘It also vetoed requests for a “reading week” 
each spring which would be devoted to individual study’) nearly contemporary with 
the 1971 example in the Globe and Mail recorded in the BCE shows us that the word 
has wide distribution from Western Canada to Central Canada.” Another important 
student newspaper source is The Gateway from the University of Alberta, available in 
digital format from 1910 to the present. (Archives of student newspapers from Ontario, 
New Brunswick, Nova Scotia, Quebec, Saskatchewan, Manitoba, and Newfoundland 
also exist.) 


4 For a fuller description of the software suite involving the BCE and Dictionary Editing Tool, see 
Dollinger (2010b). 
” Cf. Dollinger (2010a: 106-7) on reading week. 
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Smaller digitized newspaper collections can also be useful, especially for specialized 
terms. For example, the newspaper of Nunavut, the Nunatsiaq, can be searched from 
1995 to the present. Thirty-five British Columbia local newspapers dating from 1865 to 
1924 have been digitized by the University of British Columbia library and are available 
in open access. 

Our digital resources have not been limited entirely to journalistic ones. A project 
of the Government of Canada to digitize historical documents is another important 
resource. Early Canadiana Online contains a digitized selection of the microfiche col- 
lection of the Canadian Institute for Historical Microreproductions. Early Canadiana 
Online currently offers twelve online collections totalling over 3 million pages and is 
continually expanding its content. The collections include material published from 
the time of the first European settlers to the first two decades of the twentieth century. 
Publications as diverse as early Canadian periodicals and official publications, Hudson's 
Bay Company records, and English Canadian literature are included. Another important 
source is The Champlain Society Digital Collection. This consists of 101 of the Champlain 
Society's volumes (almost 50,000 printed pages) dealing with exploration and discov- 
ery in Canada over three centuries (the sixteenth-nineteenth centuries). It includes, for 
example, first-hand accounts of Samuel de Champlain’s voyages in New France as well 
as the diary from Sir John Franklin’s first land expedition to the Arctic, 1819-22. (Both 
French and English documents are included in these collections, but they can be searched 
separately. ) 

Digitized (historical) spoken data remain limited, except for transcripts of (semi-) 
scripted television news, including the CTV National News (1994-2000), CTV 
News—CTV Television (1997-present, some gaps), and Global News Transcripts 
(2003-present). It should be noted that such material, although not spontaneous, is 
not ‘sanitized’ to look like written material ‘so it retains the feel of spontaneous speech 
(Atkins and Rundell 2008: 77).8 

There are two synchronic corpora of Canadian English available electronically. Both 
are balanced corpora containing written and spoken data. The ICE-Canada corpus is 
a balanced corpus of spoken (scripted and spontaneous, monologic and dialogic) and 
written texts (academic, non-academic, reportable, instructional, persuasive, and cre- 
ative writing). However, at 1 million words, it is much too small to be of use for lexi- 
cographic purposes. The Strathy Corpus of Canadian English (now available online) 
contains around 60 million words of written and spoken Canadian English. It includes 
newspapers, magazines, biographies, historical writings, academic theses and journals, 
transcripts of university classes, Internet news, and so on. Although it contains some 
spontaneous face-to-face conversation, much of the spoken component consists of 
parliamentary proceedings, transcripts of public inquiries, broadcast news, and other 
scripted material. 


® A more complete listing of the electronic resources of historical Canadian English can be found in 
Dollinger (20114). 
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Tabie 12.1 Antedatings 
DCHP-1 OED BCE 


chesterfield 1903 1900 1890 
baby bonus 1957 1945 1905 
street hockey 1964 1953 1904 
toque/tuque 1882 1870 1858 
faceoff [hockey term] 1896 1889 1893 
treoty Indion 1881 1876 1862 
lake boat 1936 - 1837 
Toronto couch 1962 - 1939 
English Canadion a1829 - 1785 
home ice [hockey term} 1955 - 1897 
country wife 1948 - 1845 
sweep-check {hockey term] 1966 - 1933 
parkade 1958 1950 1957 
sovereignist - 1982 1971 
pure /aine - 1991 1962 


1870 (in Fr. text) 


The BCE also contains a ‘Source Documentation Tool’ in which the results of searches 
in the different electronic (and print) sources are recorded for each word researched (see 
Dollinger 2010b: 257-8 for a description). 


12.2.3 Antedatings 


Using our electronic sources, we have been successful in a number of instances in 
antedating previous sources (see Table 12.1). In some cases, we have found antedat- 
ings for both the DCHP-1 and (in most instances unrevised, second edition) OED 
entry (e.g. chesterfield ‘sofa, couch, baby bonus ‘monthly allowance paid to parents 
of dependent children," street hockey, treaty Indian ‘an aboriginal person whose 
tribe or band has signed a treaty with the government; toque/tuque ‘knitted cap’). 
At other times we have been able to antedate the DCHP-1 when the word is not in 
the OED (lake boat ‘boat used on the Great Lakes, Toronto couch ‘couch that can be 
opened into a double bed} English Canadian, home ice, country wife ‘Indian or Métis 
common-law wife of a trader, sweep check). We have come very close to the earliest 
date in OED3 while at the same time antedating the DCHP-1 in the case of faceoff and 
parkade ‘parking structure, and we have been able to antedate OED3 when the word 
is not in DCHP-1 in the case of sovereignist ‘an advocate of Québec’s right to self gov- 
ernment and pure laine ‘a Québécois descended from the original French settlers. 


14 OED3 has a1909 US quotation and a 1912 Australian quotation; the 1945 quotation is taken from a 
Canadian author of Harlequin romances. 
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12.2.4 Case Studies: garburator, gas bar, grad, government 
wharf, and GIC 


In the remainder of the chapter, I will illustrate the use of electronic sources, both for the 
collection of citations and the determination of a word’s status as a Canadianism using 
five lexical items from the letter ‘G’: 


* garburator (‘in-sink garbage disposal unit’) (also spelled garburetor, garberator); 
* gas(oline) bar (‘gasoline or service station, often with a convenience store ); 

* grad (‘graduation ceremony or dinner/dance); 

* government wharf (‘public wharf’); 

¢ GIC(=guaranteed investment certificate ‘fixed term bank account’). 


This letter was first the subject of study by Pratt (2006) and later included in studies by 
Dollinger and Brinton (2007, 2008). The ‘suspected’ Canadianisms to be investigated here 
are all of fairly long standing, but none is recorded in the DCHP-1. All are defined and iden- 
tified as Canadianisms in the Canadian Oxford Dictionary (2004), but not in the other 
synchronic dictionaries of Canadian English, except the Collins Canadian Dictionary 
(2011), which identifies garburator as a Canadianism? and lists GIC, but gives it no dia- 
lectal label. In his paper, where he used the Canadian Oxford Dictionary only ‘sporadi- 
cally, Pratt concludes that garburator, government wharf, and GIC should be included in an 
updated DCHP, but misses grad and gas bar. Government wharfis listed in the Dictionary 
of Newfoundland English (1999, Story et al.) (s.v. government n (supp)), but none of the 
words is recorded in Pratt’s Dictionary of Prince Edward Island English (1996), the only 
other scholarly regional dictionary of Canadian English. Finally, of the five forms, only 
garburator and gas bar are currently to be found in OED, labelled as ‘Chiefly Canad.’ and 
‘Canad. respectively. The four citations accompanying the OED entry for garburator entry 
are all from Canadian sources, two from Canadian newspapers, one from a text published 
in Toronto, and one from N. A. T. Grant, a writer of Canadian spy thrillers. 


12.2.4.1 Collecting Citations 


The results of our research and data collection for the five sample words are summarized 
in Table 12.2. Canadian Newsstand yields large numbers of examples of gas bar and guar- 
anteed investment certificate (GIC) especially, and reasonable numbers of examples of 
government wharf and garburator. Gas bar is strongly preferred over gasoline bar, and 
the initialism GICis preferred over the full form, guaranteed investment certificate, albeit 
to a lesser extent, Because of the polysemy of grad (especially its use in the sense ‘grad- 
uate of a school, college, or university’ or ‘student in a graduate program), the search 
was restricted to two quite common collocations, safe grad and dry grad (‘a graduation 


© Garburator is included among Thay’s (2004) ‘weird Canadian words’ and appears on websites of 
Canadian slang, e.g. <http://www.canadaka. net/content/page/12.4-canadian-slang--english-words> 
(accessed 31 October 2011). 
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party in which no alcoholic beverages are served’). The image-based nature of the Globe 
and Mail database does not allow one to give comparable figures from this source. The 
results presented in Table 12.2 show that despite its respectable size, at least for a corpus 
ofa variety of English, the Strathy Corpus is rather too small for the study of lexical items, 
and at 1 million words, ICE-Canada is of very limited usefulness as a lexicographic cor- 
pus. The television transcripts provide a few spoken examples of all of the relevant terms 
except for garburator, this gap likely being the consequence of its unlikely appearance in 
topics covered by television news. 

Our DCHP project researchers, making use of the electronic resources described 
above (Section 12.2.2), have been able to date one of the terms (government wharf) to 
the early 1800s (a period when we first begin to get a reasonable number of Canadian 
English texts), one (guaranteed investment certificate) to the 1950s, two (gas bar and gar- 
burator) to the 1960s, and one (grad) to the 1980s. Table 12.2 summarizes the material for 
these words in the Bank of Canadian English. 

As described in Section 12.3.1, we have used a ‘data extraction scheme’ in the collec- 
tion of data in order to determine the regional and temporal distribution of words. For 
grad, for example, we have collected examples from Alberta, Ontario, and Quebec for 
the first period (1982-92), from Quebec, British Columbia, Newfoundland, and Alberta 
for the next period (1993-2003), and Saskatchewan, New Brunswick, Alberta, British 
Columbia, Ontario, Manitoba, Northwest Territory, and Yukon Territory for the last 
period (2004-11), thus establishing this word as in use over time from coast to coast. 


12.2.4.2 Establishing Forms as Canadianisms 


Of course, the existence of the five terms in Canadian sources means nothing in and of 
itself, We need to establish that these forms are either unique to Canada or of high fre- 
quency in Canada in order for them to have status as Canadianisms. More specifically, 
we define Canadianisms as lexical items belonging to one of five categories (Dollinger et 
al. 2022: 171): 


1. Forms originating in Canada (neologisms), earliest attestations in Canada, 
or forms innovated in Canada, such as loonie ‘dollar coin, tuque, allophone ‘a 
non-native Canadian whose first language is neither French nor English, ghost car 
‘unmarked police car’, separate school ‘a school receiving pupils from a racial or 
religious minority’, side bacon ‘Canadian bacon, emerg ‘emergency room. 

2. Forms or meanings preserved in Canada that have fallen into (relative) disuse in 
other varieties (or were not adopted elsewhere) (e.g. chesterfield, eavestrough ‘roof 
gutter’), including retentions of British terms (e.g. soother ‘pacifier’, serviette ‘nap- 
kin, (political) riding, bursary ‘grant given to a student’) that never gained wider 
currency in the United States. 

3. Forms having undergone semantic change in Canada, such as homo ‘full fat milk 
[not ‘homogenized milk’], metro ‘any metropolitan city or area, or its local admin- 
istration, strata ‘condominium’ (also Australian), all-dressed ‘food (such as ham- 
burger or pizza) with all the optional garnishes, acclamation ‘election by virtue of 
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being the sole candidate, blue box ‘blue plastic bin for recyclables, bell-ringing ‘the 
ringing of bells to summon members ofa legislative assembly for a vote, separatist 
‘one favoring secession of Québec from Canada, and impaired ‘a driver under the 
influence of alcohol or narcotics. 

4. ‘Culturally’ significant terms, such as grade 1, 2, 3 (vs. ist, 2nd, 3rd grade), French 
immersion, treaty rights, francophone, Zamboni ‘ice resurfacing machine. 

5. Highly frequent forms (not first attested in Canada, not having undergone seman- 
tic change, not necessarily culturally significant), such as wash room, Chinook 
(wind), credit union ‘savings-and-loan, residence ‘university dormitory. 


‘Cultural significance’ is ‘a fuzzy concept, allowing for terms that are of importance to 
the Canadian “psyche’, or perhaps of greater significance to Canadians than to other 
nationalities’ (Dollinger and Brinton 2008: 52). 

Given the existence of large numbers of distinctly North American terms, widely used 
in both the United States and Canada, it is first necessary to check US sources. LexisNexis 
Academic can be searched limiting one’s search to US newspapers and newswires.’° We 
now also have the freely accessible and user-friendly synchronic and diachronic bal- 
anced corpora of US English designed by Mark Davies, each over 400 million words in 
size: the Corpus of Contemporary American English (COCA) and the Corpus of Historical 
American English (COHA), covering the period 1810-2009, In respect to historical 
newspaper collections, we have the ProQuest Historical Newspapers. 

As can be seen in Table 12.3, both corpus evidence and newspaper evidence point to 
the fact that none of our sample words can be said to be characteristic of US English. 
Garburator nevers occurs, except sixteen times in reference to a race horse named 
‘Garburetor’ and once in an article in The Chronicle of Higher Education on English in 
Canada (for which I was interviewed!). While dry grad is also not found in US sources, 
safe grad seems to have some currency in the state of California. When found in US news- 
papers, gasoline) bar almost always refers to a Canadian context. In its abbreviated form, 
it is very frequent in, and almost entirely restricted to, Canada. Guaranteed investment 
certificate is also almost exclusively found in Canada.” Government wharf may have had 
wider use in earlier US English, but now seems to have fallen out of use or become contex- 
tually limited. Of the thirteen hits in the ProQuest Historical Newspapers after the 1940s, 
nine refer either to a Canadian context or to the Government Wharf in Kennebunkport, 
ME; twenty-one of the twenty-seven hits in LexisNexis are similarly restricted. 

British English sources show that none of these words has any currency in that vari- 
ety (see Table 12.3). The seventeen examples of government wharf found in the London 
Times denote wharves around the world, but primarily in Africa; the one example in the 


16 LexisNexis Academic has over 1,200 newspapers from around the world, dating back to 1977 (see 
<http://academic.lexisnexis.com/online-services/academic/academic-overview.aspXx>). 

1” In the United States, a ‘guaranteed investment contract’ (also abbreviated GIC) is held with an 
insurance company. Apart from false hits, almost all of the examples of GIC(s) in COCA (some 105 
instances) refer to this different financial product. 
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British National Corpus refers to a wharf in Fiji. The two examples of guaranteed invest- 
ment certificate in 200 years of The Times (London) both occur in a Canadian context. 

The Web now provides a way to search across varieties of English, and although such 
searches must be treated with a great deal of caution,” they can lead to some interesting 
cross-variety comparisons. Using Google Advanced Search, we can restrict our searches 
to particular domains. Most significantly, we can search the .ca (Canada), .au (Australia), 
.nz (New Zealand), .in (India), .hk (Hong Kong), .uk (United Kingdom), .ie (Ireland), 
and .za (South Africa) domains. The United States does not have a comparable domain, 
but .edu is often used (.gov, .mil are also possible). Results of a Google Advanced Search 
for the relevant terms across these eight domains is provided in Table 12.4. 

Because the sizes of the different domains differ, the raw frequencies must be nor- 
malized in some way. In order to do so, the frequency of a basic word (with no obvious 
regional variants) is tabulated for each domain, and the size of the domain is estimated 
on this basis (using the size of the Canadian domain asa basis). Here, I tabulated the fre- 
quency of the modal auxiliary could’ and of the basic verb take (which proved to be very 
similar, see ‘Table 12.4) and used an average.2” Comparing the normalized frequencies, 
we can conclude that garburator, gasoline bar, grad, and guaranteed investment certificate 
appear to be clear Canadianisms. For example, garburator has a normalized frequency 
of 332,000 in Canadian English; Indian English, with the next highest count, has a nor- 
malized frequency of only 1239. The normalized frequency of gasoline bar in Canadian 
English is 15,600, compared with Australian English with 271. Only government wharf 
shows up in another dialect of English, namely New Zealand English, where it is 5.7 times 
more frequent (normalized frequency 226,485) than in Canadian English (normalized 
frequency: 40,000). 

Based on our research using electronic sources, therefore, we would conclude as follows: 


¢ Garburator is apparently a ‘type 1’ Canadianism, first attested in Canada and likely 
innovated in Canada. However, because the origin of the term is unknown and the 
implement itself was a US invention, we must exercise a degree of caution concern- 
ing this classification.” 


18 On the use of the Web as a lexicographic corpus, see Grefenstette (2002) and Kilgarriff and 
Grefenstette (2008 [2003]). While Web data are admittedly ‘dirty’ (unedited and potentially full of errors, 
written by a variety of [unknown] people, perhaps not even native speakers, etc.) and while Web searches 
have problems (covering only a fraction of the pages, indexing pages not text, biased towards North 
America, unspecifiable in respect to linguistic features such as lemmas, providing too little context, and, 
most importantly, statistically unreliable), Kilgarriff and Grefenstette are nonetheless convinced of the 
ultimate usefulness of the Web for lexicographic study. 

8 Mair (2006: 101) shows that could has a very similar frequency in British and American English in 
the 1990s. 

20 For this means for determining the size of the Web, see Grefenstette (2002: 203) and Kilgarriff and 
Grefenstette (2008 [2003]: 93-4). 

21 The details of the invention and marketing of the garburator are unclear. It is thought to have 
been invented in 1927 by John W. Hammes in Racine, Wisconsin, and marketed by his InSinkErator 
company in 1940, However, General Electric seems to have introduced a similar unit in 1935 (see <http:// 
en.wikipedia.org/wiki/Garbage_disposal_unitz). 
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Gas(oline) bar is clearly a ‘type 1’ Canadianism. 

Guaranteed investment certificate (GIC) is a ‘type 1’ Canadianism, though it may 

have had some (minor) earlier currency in US English, thus overlapping with the 

‘type 2’ category. 

¢ Government wharf belongs to the ‘type 2’ Canadianism, an earlier form that is pre- 
served in Canada (and in New Zealand and Australia) but that has fallen into (rela- 
tive) disuse elsewhere. 

¢ Grad is a ‘type 3’ Canadianism. The form (a clipping of graduation) is clearly not 

unique to Canada (in Canada and elsewhere it can refer to a ‘graduate student’), but 

it has undergone semantic change in Canada, coming to refer to the ceremony or 

associated party. 


We would thus decide to include all of these forms in DCHP-z, although noting the use 
of government wharf in New Zealand as well. 


12.3 CONCLUSION 


The advances in electronic text production and availability in the twenty-first century, 
on one hand, combined with limitations in resources (staff, time, and especially pub- 
lic monies), on the other hand, suggest that it may no longer be possible to undertake 
large-scale, multi-year historical lexicographic work, such as has been the case in the 
past, for example, with the Middle English Dictionary or the Dictionary of Old English. 
Especially when dealing with a variety of a language with a shorter time span, such as 
Canadian English, lexicographers can make use of historical databases, especially his- 
torical newspaper archives, and large historical corpora that are now becoming available 
(such as COHA). The wealth of materials that are either freely available on the Web or 
via subscription is increasing daily. The updating of the Dictionary of Canadianisms on 
Historical Principles, which is currently being undertaken in a purely online environ- 
ment making use of pre-existing electronic text databases and historical corpora not 
specifically designed for lexicographic work, has been successful in collecting an impres- 
sive set of data, and has been able to antedate citations gained through more traditional 
means, as were used in the first edition of the dictionary. Furthermore, electronic data- 
bases and corpora-—including the Web as a corpus—have made cross-varietal compari- 
sons easily possible and allowed for the establishment of lexical items as characteristic of 
Canadian English. The rather small-scale project reported on in this chapter bodes well 
for the use of electronic resources in the composition of both synchronic and historical 
dictionaries for other varieties of English in the future. 
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13.1 GENERAL CONSIDERATIONS 


TuIs chapter examines the challenges of grammatical analysis and documenting gram- 
matical change within a historical dictionary, drawing its examples from the Oxford 
English Dictionary. In particular, it uses a comparison of decisions made in the first and 
current (third) editions of the OED as a starting point for identifying key points of dif- 
ficulty in a historical dictionary. 


13.1.1 Grammar Models 


The editors of the first edition of the OED did not codify the grammatical system with 
which they described the meaning and use of the words defined in the dictionary. 
They probably did not think it necessary, assuming that there was a generally accepted 
model of grammar for the English language, based on the traditional classical system as 
adapted during the nineteenth century to the description of English.’ Lack of a stated 
grammatical model produced a certain degree of variation in the way similar features 
of English were described, Compare the grammatical labelling of his poss. pron., sense 
B. 2: ‘Poss. adj. pron. masc. (orig. poss. gen., and then, like Latin ejus, often following its 


! The fourth editor, C. T. Onions, wrote a grammatical description of English (Onions 1904). Since 
this was originally part of the ‘Parallel Grammar Series, which contained grammars of other languages 
arranged to facilitate comparison, Onions’ book may have been influenced in its principles by the 
requirements of that series. Differences from the OED approach can certainly be found: for example, 
OED maintains a strict distinction between verbal nouns and participial adjectives on the one hand 
and gerunds and participles on the other, whereas Onions treats them together, employing the terms 
*verb-noun’ and ‘verb-adjective additionally. Nonetheless, it probably presents a good picture of the 
grammatical thinking that lies behind the OED. 
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sb.) and their pron., sense 1a: ‘poss. adj. (orig. gen. pl. of pers. pron.), or the definition of 
shall v. 8 ‘As a mere auxiliary, forming (with present infinitive) the future, and (with per- 
fect infinitive) the future perfect tense’ with that of will v.' 11 ‘As auxiliary of the future 
tense with implication of intention or volition. 


13.1.2 Underlying Principles in OED2 and OED3 


A principle underlying all stages of the OED and applying much more widely than to 
grammar alone is that of unambiguous use, that is, that where a lexical feature has devel- 
oped gradually from one state to another, the later state is regarded as having begun 
when uses are found that can only be so interpreted, so that ambiguous uses are assigned 
to the earlier state. A principle that underlies a number of changes of policy from 
OEDi/2 to OED3,° and applies chiefly to grammatical categorization, is that of discrete 
classification, that is, that when a lexical item appears to lie on a borderline between one 
grammatical category and another, especially where these are standard ‘parts of speech, 
it must normally be assigned to one or other of those categories and not consigned to 
a terminologically grey area with a special label reflecting its dubiousness. OED2 tol- 
erated such grey areas, often avoiding precise categorization by employing transitional 
categories. Two minor but frequent transitional categories are the prefixing of quasi- to 
a grammatical Jabel and the use of ‘attrib. passing into adjective} in noun entries. In 
both cases the equivalent text in OED3 embodies an editorial decision about the distinct 
category that the item so labelled belongs to. 


13.2 LEMMATIZATION 
AND ENTRY ORGANIZATION 


Grammar makes its presence most immediately felt as part of the system for disambigu- 
ating headwords. Obviously, a dictionary has three main options for making distinctions 
between headwords: (1) To distinguish only differently spelt words, ignoring both word 
class and etymology, and treat differences as simply semantic; this option is not appro- 
priate for a historical dictionary. (2) To distinguish homographs by broad semantic and 
etymological criteria, attaching as many homograph numbers as there are homographs, 
whatever their word class membership, which is done by many current-language dic- 
tionaries. (3) To distinguish homographs by word class and within that by semantic and 


2 An observation which there is no space to elaborate here is that entries for grammatical terms are 
generally rather poor in OED:. 

3 Entries cited as from OED3 are those published online that have been fully revised under the current 
programme. Entries cited as from OED Online are the remaining entries, also published online, that have 
not been comprehensively revised but have been subject to numerous global enhancements since their 
appearance in the printed Second Edition. 
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etymological criteria; this is followed by the OED and most ‘unabridged’ dictionaries as 
well as many ‘collegiate’ and smaller ones. 

OED3 deploysa strictly delineated closed set of word class labels (‘parts of speech’) for 
distinguishing headwords. The first eight are the traditional parts of speech: 


e noun, 

« adjective; 

e adverb; 

* pronoun; 

* verb; 

* preposition; 
« conjunction; 
« interjection. 


The remaining four exist as a response to the requirements of dictionary organization: 


»* phrase (discussed in Section 13.8.5); 
* prefix; 
e suffix; 


« combining form. 


Essentially the same set was employed by the first edition. It contained two additional 
regular items, ‘verbal noun’ used for words in -ing based on verbs, used as nouns, and 
‘participial adjective for words in -ing and -ed based on verbs, used as adjectives.* 
But there were also a number of sporadic and fluctuating grammatical labels, which 
included: the addition of ‘phrase’ to a part of speech label (e.g. ‘adverbial phrase’ (advb, 
phr.)) for certain multi-word headwords, ‘gerund, ‘present participle’ (pres. pple.), and 
‘past participle’ (pa. pple.). ‘Latinate’ participles (i.e. words formed on Latin past parti- 
ciples, without the English -ed suffix, and used like them) often had the part of speech 
‘past participle’ (see, e.g., coronate pa. pple.): in OED Online, their word class is adjec- 
tive but their use as participles is noted in the definition (compare coronate adj.'). One 
anomalous feature of the first edition is that nouns, which in that edition were called 
‘substantives’ (sb.),°> were not given a part-of-speech label if there were no homonyms 
in other word classes; this might have been unambiguous had it not been that some 
other entry types were sporadically left without part-of-speech labels, examples of this 


* Participial adjectives could be said to display lexical change in process. They are essentially 
participles being used as adjectives and nouns. Historically they always have parent verbs of which they 
remain instantiations, OED: showed logic in treating these two classes separately with double-barreled 
names to show their dual nature; though with verbal nouns it was not at first clear about where the 
boundary lay (see Section 13.9). In accordance with its discrete word class principle, OED3 has stripped 
the modifier from the label and merged them with the other adjectives and nouns. 

> This reflected a classification of nouns and adjectives as ‘noun substantive’ and ‘noun adjective 
which must have been obsolescent even in 1884. 
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omission including major entries such as plus (OED Online: plus prep., n., adv. and adj.), 
and minor ones like ykeuered (OED Online: ykeuered adj.). 


13.3 GRAMMATICAL CHANGE 
AS AN ASPECT OF LEXICAL HISTORY 


It is a truism that much linguistic evolution occurs by means of grammatical change. 
A shift in use from one grammatical category to another can be regarded as a kind of 
semantic development comparable to the emergence of a new sense. In a historical dic- 
tionary, such a shift can be handled either by treating the new lexical item as a sepa- 
rate entry, or by keeping it within the parent entry but assigning it to a sense division in 
which the development is signalled, or, ifit is not felt to be highly significant, by covering 
it with an extension of the definition of the sense from which it has developed. 

If the derived lexical item is handled as a separate entry the main link which signals 
the historical development is the etymology, which indicates that the entry is derived 
from a prior parent entry. A subordinate link may be implicit in the definition, for exam- 
ple ‘the action of the verb PAUSE’ at pausing vbl. sb. This method is usual in OED2 for 
handling items arising by means of suffixation. There is also, however, a version of this 
which operates as a halfway house: the ‘derivatives section’ appended to the end of the 
parent entry, saving space by implicitly inheriting many of the phonetic, etymological, 
and semantic features of the parent. The original plan of the OED did not envisage this 
device: it emerged in OEDz1 in the course of the letter A (at archbishop). It is so useful 
that it has been preserved in OED3 where saving page space is not a consideration. 

Lexical items that arise by conversion or ‘zero derivation’ can also be handled in four 
ways in OED2. They may constitute separate main entries. Under what circumstances 
this option is chosen is a subject to which we will return (see Section 13.8.1). They may 
also be handled within the structure of the parent entry. Structurally, such an entry con- 
sists of two or more ‘branches’ of equal rank, each headed by its part of speech and each 
allowed to contain the full hierarchy of numbered senses and subsenses. The derivative 
nature ofa branch is indicated chronologically by its not being the first. In OED3, every 
entry contains as its highest sense level at least one ‘sense level 1’ labelled by a part of 
speech, and if there is more than one each acquires a serial letter (capital A., etc.). 

The most usual couplings of parent and derived lexical item within one entry are 
(each case may occur in reverse order): noun + adjective; adjective + adverb; adverb + 
preposition or conjunction; noun or adjective + interjection. Larger concatenations are 
possible, for example minus prep., n., adv., and adj. The one part of speech that never 
qualifies as a branch in such an entry is the verb. Its exclusion from this treatment car- 
ries the implication that lexical items generated by zero derivation from verbs, and verbs 
generated in this way from other items, instantiate a process that is different in nature 
from other kinds of conversion; it is certainly true that in the other cases it is generally 


GRAMMATICAL ANALYSIS AND GRAMMATICAL CHANGE 225 


possible to view the converted use as a shift in the use of a word within a particular 
clausal argument or adjunct, whereas a change to or from verbal use represents a change 
in clausal function to or from predicate.® 

Thirdly, a change of word class can be handled within one of the hierarchy of senses. 
For example, at pitiful a., sense 3: ‘b. as adv. Pitifully. This is a quite common feature in 
OEDz. In OED3, in accordance with the principle of discrete classification, practically all 
such senses have been elevated to the status of branches with their own appropriate part 
of speech (OED3, pitiful adj. and adv.: ‘B. adv. Pitifully’). 

Finally, a change of word class may be incorporated as an extension of the definition 
of the parent sense. As one might expect, in OEDz2 this device is used when the new lexi- 
cal item is regarded as a rather minor, trivial, or ephemeral case. But in OED3, with the 
greater availability of documentation, many such subsumed items have been elevated to 
the status of branches, or even separate entries. For example, in OED2, redevable a. (and 
n.) has the definition: ‘Beholden, indebted. Also as n., a debtor’ In OED3 (at redevable n. 
and adj.) the second sentence of this definition has become ‘A. n. a debtor’ Nevertheless, 
new ‘minor’ cases have emerged in their turn in OED3 for which this device contin- 
ues to be useful (when the derived use is very rare); for example saditty adj. (and adv.) 
‘Affecting white middle-class values, esp. characterized by an air of superiority; con- 
ceited, “stuck-up”. Also as adv’ (covering an example of ‘to act saditty’). 

Devices signalling a change of word class not incorporated as a separate branch 
include the labels ‘absolute’ and ‘elliptical’ (see section 13.7.3). 


13.4 GRAMMAR WoRDS 


It is customary in linguistics to distinguish open-class words (nouns, most adjectives, 
main verbs, and some adverbs) from closed-class grammatical words (pronouns, some 
adjectives or determiners (ifthis term is used), modal and auxiliary verbs, some adverbs, 
and conjunctions). It has been argued that closed-class grammatical words should be 
described as part of the grammar system of a language and should not be included in 
dictionaries (see, e.g., Dixon 2010: 48). Such an approach may be applicable to the syn- 
chronic description of a language but would be inconceivable in a historical diction- 
ary such as the OED. This is mainly because diachronic linguistic change converts some 
open-class words into grammatical words, and vice versa. 

As is well known, open-class words can evolve into grammatical words over time 
(‘grammaticalizatior). It is an essential part of the dictionary’s task to describe this evo- 
lution. For example the verb ought (ought v.), has evolved from being the past tense of 
to owe to the condition of a pure auxiliary.” Moreover, open-class words can develop 
senses that constitute fully grammaticalized lexical items, while retaining their original 


6 The New Shorter OED, in fact, allows a verb to form a branch of a composite entry of this kind. 
7 Strictly speaking, lexical uses lingered on in regional dialect and are not labelled obsolete in OED3. 
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character in their other senses. For example, the verb to let (OED let v.'), the original 
meaning of which is ‘leave’ or ‘allow to pass’ (Branch I.), and which retains a number 
of lexical meanings, developed in Middle English and still has an imperative auxil- 
iary use with the infinitive (Jet us go). Or an inflected form of an open-class word may 
develop an independent use as a grammatical word, for example the conjunction pro- 
viding. Additionally, grammatical words can develop from compounds that start out as 
straightforward syntactic constructions: for example as and also from all so. 

Conversely, grammatical words can develop uses that are the equivalent of open-class 
items or of syntactic constructions mostly consisting of open-class items; they can be 
used idiomatically or phraseologically in ways which parallel open-class items: for 
example ‘this bacon is off; ‘such behaviour is completely out’; ‘this programme is a must’; 
‘no ifs or buts’; ‘that coat is so you’; Td like to sit and just be. Such a word may lose its 
original uses completely, as did thorough (thorough adj.), originally a variant of through, 
preposition and adverb, but subsequently differentiated in form in this use. 


13.5 ANOMALOUS HEADWORDS 
AND PHRASES 


In OED2, as has been mentioned, a number of entry-level word-class labels are com- 
pound terms ending in ‘phrase’ (phr.): in OED3 these are normally assigned to one of 
the canonical word classes (usually the label preceding the word ‘phr’). ‘Phrase’ as a 
headword label is also found standing alone, and this feature has (with some reluctance) 
been adopted and regularized in OED3. The reason for this is that OED contains some 
headwords that can neither be analysed as members of any canonical word class nor be 
subsumed into any other entry (because no element within them is also the headword of 
an entry). These are typically multi-word items borrowed from other languages, some- 
times but not necessarily containing a verb; for example reculer pour mieux sauter phr. 

The more prominent use of the grammatical label ‘phrase’ in OED2 is within the entry 
to describe a multi-word ite m that has less coherence than a compound (where the iden- 
tity and order of the elements are more or less fixed), being either an idiom that requires 
definition, or a matrix in which the word in a particular sense occurs with statistically 
significant frequency. In OED2 these are very often placed under the senses they instan- 
tiate and introduced within the definition by the word ‘phrase’ though in some entries 
they are gathered into a dedicated sense section. In OED3, many entries have been pro- 
vided with a special ‘phrases section, into which all uses of each phrase are gathered, 
bringing together uses in different senses. This has the advantage that the user can see 
in one place all the phrases into which a word enters, including all the different uses of 
identical phrases that were scattered in OED2 under different senses of the headword. 
Many entries in OED3 preserve the distribution of phrases among the sense sections; 
however, they are no longer introduced by ‘phrase’ 
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In contrast to the phrase-like items that can be assigned to one of the canonical word 
classes mentioned above, there occur a few single-word lexical items whose structural 
role is similar to that of a full clause. These words usually have other uses that can be 
classified as regular parts of speech. For example, the earliest use of memorandum has 
essentially jussive force (‘it is to be remembered (that)’) but this is hardly to be inter- 
preted as an English verb in the imperative. The canonical word class that has the most 
similar qualities (i.e. standing alone and resembling a clause) is ‘interjection, and so in 
OED3 such uses are classified as interjections (see memorandum int. and n.) 


13.6 GRAMMATICAL ASPECTS OF THE NOUN 


13.6.1 Nouns with Plural Form and Plurality 


OED2’s usual way of treating nouns ending in plural -s (or, in a loanword, the equivalent 
Latin or Greek plural termination, such as -i, -ae, or -a) is to label the headword ‘sb. pi.’ 
(i.e. noun plural). So, for example, ninepins is sb. pl. in OED2. Agreement with plural 
modifiers and verbs is taken as implied by the grammatical label at headword level. 

The OED2 approach has several drawbacks. First, if information about agreement is 
absent or conflicting, it demands that there be evidence that the word is etymologically 
a plural formation. Secondly, even if that information is known, it is not necessarily rel- 
evant to the subsequent use of the word after its origins had been forgotten. Thirdly, it 
becomes inaccurate if, as often happens, the word either develops a use in the singular 
form or develops a singular use (i.e. a use with singular concord) while remaining plural 
in form (and such a use can sometimes be treated as a singular base form to which a plu- 
ral ending is added: compare media n.”). 

In OED3, the use of ‘noun plural’ as a category has been abandoned. Noun headwords 
that are plural in form are treated no differently from other headwords. If they are evi- 
dently plural formations in origin, this is shown in the etymology. Senses in which they 
are used like ordinary plural nouns, that is with plural concord, are shown by a plural 
definition and/or the formula ‘with pl. concord’ In OED3 ninepins is simply a noun, and 
labelling is employed to distinguish its use in plural form with singular concord (for the 
game) from its use as a count noun, usually in the plural for the ‘pins’ themselves. 

Entries for plural nouns that have developed uses in the singular form can be given a 
separate branch for singular use. Thus pantalettes n., which normally denotes a pair and 
cannot differentiate numbers of pairs, has developed a singular form (Branch IL, sense 
3 in OED3). Entries for plural nouns that have developed a use in the singular, such as 
pants n. 1b, can be given a sense labelled ‘with sing. concord. 

Labelling senses of nouns with ‘with singular concord’ or ‘with plural concord’ is gen- 
erally a satisfactory way of dealing with noun plurality, as it is a neutral record of the 
behaviour of the noun in context. By contrast ‘in plural’ and ‘in singular’ are labels of 
form, the former implicitly referring back to the plural form specified in the inflections 
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section of the entry (though for the majority of nouns this will be absent, implying a 
regular plural). 

Where a noun does not change for the plural, the inflections section (an OED3 inno- 
vation) specifies ‘plural unchanged; and this specification is used comprehensively 
for several different kinds of noun that without change of form refer to a number of 
items, especially nouns of nationality such as those in -ese. Contrast OED2, in which 
the unchanged plural use was frequently labelled within the definition as ‘collective; for 
example at Bernese n. 1. Such uses, however, are not only collective, as an example like 
‘Five out of six Guyanese live in villages’ (at Guyanese 1.) shows. 


13.6.2 Countability and Collective Nouns 


In both OED2 and OED3, countability is not explicitly marked for every sense of every 
noun. It is generally shown through the definition. The advantage of this is that many 
senses can slip readily between the two categories with no significant change of mean- 
ing: it is virtually a syntactic option. Labelling for countability is generally employed as 
a contrastive device. In OEDz there is no fixed formula for mass nouns: ‘collective(ly)’ 
is one not uncommonly used (contrast, e.g., the definitions of ball 7.1 5b in OED2 and 
OED3); for count nouns it is ‘with a and pl? OED3 has chosen simply ‘as a mass noun’ 
and ‘as a count noun’ (for an example of the latter, see absolute n. 3b). OED2 frequently 
uses collective’ ‘collective singular; and ‘collective plural’ as labels for senses of nouns 
that do not have interchangeable singular and plural uses. In OED3 these have been 
abandoned. The plurality of form or concord is handled in the regular way, while the 
collectivity is simply noted in the definition, usually with the word ‘collectively’. 


13.7 CATEGORIES OF WORD CLASS CHANGE 


13.7.1 Conversion 


As was mentioned above, a lexical item derived by ‘conversior or ‘zero derivation’ may 
be handled in the OED either as a separate main entry or as a part of the entry for the 
parent entry, by assigning a first ‘part of speech’ ‘branch (usually labelled A.) to the orig- 
inal use anda second branch to the derived use (labelled B.). The question arises: what 
dictates the decision whether to assign a separate main entry or a branch? 

This is best considered in relation to noun/adjective pairs, as these are probably 
the commonest candidates for such a decision. The underlying assumption is that 
in English the noun and adjective classes are particularly closely related (compare 
the old-fashioned grammatical category ‘noun, subdivided into ‘noun substantive’ 
and ‘noun adjective, which is still reflected in OEDi’s decision to describe nouns as 
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‘substantives’). Conversion between them is so commonplace that for some instances 
it could be assigned to the syntax. For the development of nouns into modifiers, see 
Section 13.7.2. 

There are many kinds of related noun/adjective pairs which for practical purposes 
can be viewed as having come into existence together, even if actual printed records 
show one as prior to the other, especially those formed by suffixation from proper 
nouns. It is likely, for example, that recent coinages of political labels such as Paisleyite 
and of ethnonyms such as Padanian are best viewed as noun-cum-adjective items or 
items available in both functions, and the priority of use in one class over the other 
in print in the same year (1966 for Paisleyite as both adjective and noun, 1996 for 
Padanian) is arbitrary. Geological (and archaeological) period labels are similar; the 
frequent presence of the in the noun use suggests an underlying elliptical adjective use 
(compare the three examples quoted at Pebidian n.). In cases of this kind the decision 
to treat both parts of speech under one headword seems obvious. 

There are, however, other categories of homonymic pairs brought about by conver- 
sion where the criteria for using two separate entries rather than a single entry with 
branches are less clearly defined. Taking ‘colour’ terms as an example: in OED2 the entry 
for black is divided into an adjective and noun branch, while there are separate noun 
and adjective entries for white. The noun uses of the words are etymologically similar, 
being what OED2 called an ‘absolute’ use (see further Section 13.7.3) of the adjective. 
They are also not dissimilar in extent: black (OED3) adj. has 15 senses, noun 16 senses; 
white (OED3) adj. has 11 senses, noun 20 senses. It is hard to see what, other than the 
noun being larger in numbers of senses, and the adjective having a considerable number 
of special uses, induced the original editors of the OED to make white into two entries. It 
may have been a change in editorial outlook towards the end of the alphabet, or the pref- 
erence of an individual editor; but while nearly every one of the other long-established 
‘colour’ words (green, red, purple, and yellow) is a single entry with adjective and noun 
branches, blue, a close alphabetical neighbour of black, is also dealt with in two entries in 
OED2z, of which the noun one is as much ‘the adj. used absol. or elliptically’ as the noun 
branch of black. In OED3 the noun and adjective uses of both white and blue are treated 
in unitary entries, 

The case of adjectives and related adverbs is based on the historical merger of the orig- 
inal adverb class, marked only by an -e suffix in Old English, with the parent adjectives, 
which meant that the use of an adjective in adverbial function became, in some varieties 
of English at certain times, virtually a syntactic operation. Where the unsuffixed adverb 
has remained in standard English (e.g. fast), OED1 made separate entries, but where the 
unsuffixed adverb was restricted to regional or poetic language it was treated usually asa 
sense, which is almost always upgraded into a branch in OED3. 

The reverse conversion (of adverb to adjective) is similarly handled with separate 
entries where the derivation is well established (e.g. well). It has become more frequent 
in recent times with phrase-like compounds of the type exemplified by off-stage adv. 
(1861) and adj. (1904), which usually constitute single entries with two branches. 
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13.7.2, Nouns and Modifiers 


An important term in OED: was ‘combination, which generally speaking covers the 
same ground as ‘compound’ today. ‘Compound’ was available to the OED: editors, 
and occurs from time to time, but ‘combination’ was preferred. Entries for open-class 
words, especially nouns and adjectives, but to some extent adverbs and verbs, frequently 
end with a section labelled ‘Comb’ The concept was distinct from and wider than that 
of ‘attributive, and many noun entries end with a section labelled ‘attrib. and Comb: 
Attributive uses, obviously, were those in which the noun acted as a premodifier to 
another noun, but it is unclear whether uses of this kind, if they had evolved sufficiently 
to require definition, were regarded as having acquired ‘Comb status. 

The boundary between the category of noun and adjective is permeable in both direc- 
tions. Where a word can be used in both categories, the existence of one of the two uses 
is virtually always the result of historical change. In the case of the use of nouns as modi- 
fiers there is sometimes a problem of deciding either at what point in the evidence the 
change has happened, or whether in fact there is a case for regarding the modifying use 
of the noun as adjectival. In OED2, this uncertainty frequently occasions the use of tran- 
sitional categories, especially ‘passing into adj.’ (see Section 13.1.2). On the principle of 
discrete categorization, in OED3 a number of criteria are generally applied to decide 
word class, such as (a) the presence of an intensifier (e.g. ‘a voice more expostulatingly 
Oxford than ever’); (b) a noun phrase head which is little more than a nominalizer (e.g. 
‘the quantum nature of the exchange of energy’); (c) use of the noun predicatively (e.g. 
““She's church” —by which she meant episcopaliar’). In accordance with the principle of 
unambiguous use, the boundary is deemed to have been crossed only when such cases 
begin to occur. 


13.7.3 Absolute and Elliptical 


‘Absolute’ was much used in OED2. It reflects the essential idea that something which 
normally forms part of a larger construction stands alone with the same meaning as 
that construction. For its use in verb definitions, see Section 13.8.1. With adjectives and 
nouns it was a useful label since it could be used almost anywhere where the expected 
head of a noun phrase is absent and only the modifier is present. With adjectives, absol. 
indicates that an attributive (i.e. premodifying) use can also occur without an explicit 
head word. The ‘absolute’ use of an adjective when subsumed within a definition in 
OED2z is reclassified in OED3 as ‘noun or ‘as noun’ (e.g. rightful #. 1 and 2 in OED3; 
sense 1b ‘absol.’ in OED2; Rhénan [B.] n. in OED3; Rhénan a. ‘Also absol? in OED2). 
With nouns (especially eponyms) normally used attributively, absol. indicates that the 
noun can also stand alone with the same meaning as the full phrase. In eponym entries 
the ‘absolute’ uses of an attributive noun are treated as a second branch, labelled ‘Simple. 
uses’ in OED3 (e.g. Paul-Bunnell n. II. in OED3; ‘used . .. absol? in OED2). ) 
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However, the occurrence of noun phrase constructions with a genitive noun as modi- 
fier raises further questions. These constructions are of course very common in English 
(e.g. chemists shop, St Mary’ church, Grindlay’s Bank, Parkinson’ disease), and so is the 
use of the modifying genitive noun on its own to stand for the whole expression.’ Where 
the referend is predictable (‘I’m going to the chemist’s’ (compare chemist n. 3b in OED3), 
‘the family attended St Mary’s’) the transformation is virtually a feature of syntax. In 
the case of businesses which potentially involve a number of people plying the same 
trade there can be uncertainty about whether the genitive singular or genitive plural is 
required, or indeed whether the word is simply plural (he took it to the cleaner’s/clean- 
ers/cleaners: compare cleaner n. 1c).’ In certain cases, such as eponymic disease names 
like Parkinson's, Alzheimer’, etc., the use is lexicalized and requires specification in the 
OED, The premodifying uses (e.g. Parkinson patient, Parkinson's patient at Parkinson 
n. 2) are labelled ‘attrib. and in the genitive. But the ‘absolute’ use (as in ‘common mis- 
conceptions that Parkinson's is fatal’; Parkinson n. 3) cannot be readily contrasted with 
the premodifying uses, since neither calling it a ‘noun use’ nor labelling it ‘in the geni- 
tive’ would signal a contrast with the foregoing uses, and there seems to be no term in 
linguistics that picks out this particular use. OED3 has therefore retained ‘absol? in this 
context alone. 

‘Elliptical’ is rather similar to ‘absolute’ but with a more general range of applications 
in OED2. Its central use is to signify that an absent word or phrase is to be understood 
but that no change of construction is involved, e.g. disposed adj. 4b (= ‘disposed to mer- 
riment’), vouch v. 1b (= ‘vouch to warrant’), but it is also used in cases where there is 
effectively a change of construction or word class, for example humble adj. 1¢ (your 
humble = ‘your humble servant’), last adj. 1d ‘the last day of a month’ In OED3 elliptical’ 
is used in a general way to indicate the omission of a word or phrase supplied by the con- 
text (see for example or conj.! 6b) and not to label a word class change. 


13.8 GRAMMATICAL ASPECTS OF THE VERB 


13.8.1 Transitive and Intransitive 


Transitivity is the most prominent grammatical category in the verb entries of OED2. 

Curiously, transitivity within most verb entries is treated as a ‘toggle’ state, being set 
to either trans. or intr. at the first sense and maintained, without a repeat of the label, 
until it changes to the opposite state, and so on. For example, in clear v., sense 1 (within 


8 The first attested case of this (late fourteenth century) seems to have been Pau!'5, for the cathedral in 
London: see Paul's n. 3. 

° Some business titles have tended to discard the apostrophe (e.g. Grindlays), while others (e.g. 
Sainsbury's) to preserve it, 
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Branch I.) is labelled trans., sense 2 is labelled intr., sense 3 is again frans., and this label 
continues in force, without being repeated, as far as sense 13 (in Branch III.), which is 
labelled intr. (and so on). In OED3 each sense has its own transitivity label but tran- 
sitivity is labelled at the highest number level if the nested senses all have the same 
transitivity. 

In contrast to this, in the first edition, in a few large and medium-sized verb entries, 
transitivity was treated as a primary feature, in that either all the intransitive uses or all 
the transitive uses were grouped into a single branch, and other uses were allocated to 
one or more other branches. Examples include range v.', run v., start, v., talk v., wend 
v.! The entry pass v. is notable in that it contains Branch I. ‘Intransitive uses, embracing 
senses 1 to 27; Branch JI. ‘Transitive uses. (From [Branch] I. 12-19); 28 to 38; Branch III. 
‘Causative uses, 39 to 54; and Branch IV. ‘With prepositions and adverbs: ‘Causative’ is 
not normally used as a major category (it is essentially an aspect of transitive’) but here 
it is used as a third state. ‘The entry is further complicated by the inclusion in Branch 
I. of a sub-branch introduced by *********** containing ‘Elliptical or absolute uses of 
[Branches] II. or III’: so sense 27a represents an objectless use of the ‘causative’ sense 46b 
‘to transfer (the ball) to another player on the same side, while sense 26b, which refers to 
forgoing a turn or bidin cards, perhaps represents an objectless use of sense 28a, ‘to leave 
behindor on one side as one goes on. 

Since at least Middle English times verbs have readily been converted from transi- 
tive to intransitive use without requiring morphological change. Making the first struc- 
tural cut on the basis of transitivity is a priori likely to be unhistorical and misleading. 
In OED} transitivity is regarded as an aspect of semantic development. In structuring an 
entry it is subordinated to semantic grouping, and hence in large entries which in OED2 
were divided by transitivity, senses from the separate branches are brought together into 
semantic groups. In pass v., sense 1a, frans., ‘to exceed in excellence’ is what was OED2’s 
37a, sense 1b is 37b, sense 2 is 36, 3 is 35, 4 is 34, all from Branch II, but 5 intr. is sense 19 
from Branch I.; and so forth. 

‘The term ‘absolute’ is frequently used in OE D2 to label a use of a verb with no object 
that directly corresponds to a transitive sense (e.g. baptize v. 1b): the idea being that the 
use is really transitive but the object is unspecified. In accordance with the principle of 
discrete classification, in OED3 such uses are treated as intransitive. 


13.8.2, The Transitivity of Prepositional Verbs 


‘There are two ways of analysing verbs that are used idiomatically with an associ- 
ated preposition, such as fo look after (= to care for) and to look up to (= to respect). 
Some grammars (e.g. Greenbaum 2000: 273-4; Dixon 2005: 290) treat the object of 
the preposition (in these cases after and to) as the verbal object, and therefore regard 
these multi-word verbs as transitive: the type represented by to look after is some- 
times called a ‘monotransitive prepositional verb’ and that represented by fo look up 
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to a ‘monotransitive phrasal-prepositional verb’!° OED, taking a historical approach, 
treats these two verb types as intransitive verbs followed by a preposition and its noun 
phrase, just as if there were no idiomatic sense involved (as in to come after and to come 
up to), because that is how they have evolved. 


13.8.3 Reflexivity 


The reflexive use of a verb, though of rarer occurrence, is treated in OFD2 as a third state, 
separate from transitive and intransitive, with the same ‘toggle’ status. In OED3 reflexive 
is treated as a subcategory of transitive. So in plume v., sense 3 is trans., senses 4a, b, and 
care refl., sense 5 is intr., and sense 6 is trans. In OED3, the equivalent senses are 1 trans. 
a., 4 trans. a. refl., 5 trans. (refl.) , 6a trans. (refl.), 6b intr., and 4b trans. 


13.9 GRAMMATICAL DESCRIPTION 
WITHIN THE DEFINITION 


OED2 generally uses a limited inventory of formulae to describe grammatical construc- 
tions, and they mostly follow a commonsense approach. On the whole, narrowly techni- 
cal language is avoided. A very common expression is ‘const, used to introduce words, 
especially prepositions and adverbs, with which a headword may be construed (this has 
been replaced in OED3 by the even more basic ‘with’). Well-known grammatical terms 
such as ‘object, ‘complement; ‘predicate, and so on, are frequent. More carefully speci- 
fied varieties, when they occur, have the look of ad hoc uses rather than items from a sys- 
tematic nomenclature (e.g. ‘with resultant object’ at plough v. 1b and 4b). Position in the 
clause or phrase seems to be used more than structural-sounding terms (e.g. ‘followed 
by’; the series ‘In predicate’ ‘Following a sb, ‘Preceding a sb: at plenty II. 1a, b, and c is 
more unusual). Certain constructions are treated as derived from other constructions, 
but not exclusively when the implied development is supported by evidence: in other 
words, here theoretical considerations override historical ones. For example, in verbs 
the formula ‘intransitive for reflexive’ (or occasionally ‘by omission of the reflexive pro- 
noun’) is sometimes found when the intransitive use is of earlier date than the reflexive 
use (e.g. ensconce ¥. 2c) or there is no corresponding reflexive use (e.g. knit v. 5b). (By 
contrast, enlist v. 4 shows the intransitive use following historically after the reflexive.) 
In OED3 similar principles are followed. The overall aim is to employ an established 
inventory of formulae. However these are not and cannot be slavishly followed, given 


© Furthermore, the types represented by to blame (something) on (someone) and fo put (something) 
down to (someone) can be called ‘doubly transitive prepositional verb and ‘doubly transitive 
phrasal-prepositional verb’ respectively. 
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the complexities of construction through which lexical items evolve, and there are 
occasions when more than just formulaic explanation has to be provided (see, e.g., ald 
adj. senses 4a~—c, where the historical interconnections of constructions of the types a 
two-year-old sheep, a two-year-old, a child of ten years old are elucidated). A support- 
ing aim is to attempt to let the evidence, rather than theoretical assumptions, dictate 
the grammatical analysis. Where it seems useful, OED3 has adopted more systematic 
grammatical terminology: for example, when necessary ‘to-infinitive’ and ‘bare infini- 
tive’ may be distinguished (though ‘infinitive alone, implying ‘to-infinitive; is normally 
used). ‘Determiner’ is employed as a useful collective term in definitions, although it 
has not been adopted as a word class label; ‘modify’ and its derivatives, especially ‘post- 
modifying} are also used to describe constructions where convenient; ‘mass noun’ and 
‘count noun, and ‘singular concord’ and ‘plural concord; already discussed, are other 
newer terms consistently employed. 


13.10 GRAMMATICAL CHANGE 


The OED aims to describe the vocabulary of English from the earliest to the most recent 
times. As part of this description, it makes use of a grammatical model or framework, the 
outline of which has been described above. However, there is no a priori guarantee that 
the same grammatical model that fits the English language of, say, 1950 should also fit 
that of 1150 or 1450. Fortunately it is arguable that the same major categories—the main 
word classes by which lexical items are classified at entry level—can be used to describe 
Old and Middle English as are used for modern English. But below this level there have 
naturally been some changes in grammatical structure which affect the way that the OED 
describes syntactic structures into which words enter (some of these changes centre on 
morphological elements to whose history the OED assigns descriptive entries). An exam- 
ple of this is the -ing form. 

The syntax of constructions involving the -ing form has varied considerably, even 
leaving aside the origins of the form, which are covered in the OED’s entries for -ing suf- 
fix! and suffix’. The -ing form started out as an ordinary noun-forming suffix (usually, in 
Old English, in the form -ung, but also -ing, which became the dominant form in Middle 
English), much like the later borrowed suffix -ment used in engagement, preferment, 
etc. Hence OED2’s word class label ‘verbal noun is not inappropriate for words such as 
warning n.' (Old English warnung) or greeting n.! (Old English greting). During Middle 
English it developed a gerundial use which, in transitive verbs, could be followed by 
an object (e.g. 1477 ‘He is happy that vsith his dayes in doyng couenable thinges’), and, 
in any verb, could be modified by an adverb (1382 “There he wastide his substaunce in 
lyuynge leccherously’); from being a noun-forming suffix it became additionally a verbal 
inflection. Hence for modern English there is a well-established distinction between the 
verbal noun with preceding determiner and a following of-phrase and the gerund with 
no determiner and a direct object. 
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Applying this distinction to the English of the period from late Middle English to the 
seventeenth century is not straightforward, however, because during this intermediate 
period what OEDz2 (at -ing suffix’) calls ‘mixed constructions’ were frequent, ‘in which 
the word in -ing had an adjectival qualification with a verbal regimen, or conversely 
an adverbial qualification with the construction of a noun followed by of, for example 
Sidney Arcadia i. iv. 15 b, ‘to fall to a sodain straitning them’; i. xii. 56 b, ‘by the well choos- 
ing of your commandements. The position adopted (with general but not total consist- 
ency) by the OED is that all gerundial uses of the -ing form, even when modified by a 
determiner, count as part of the verb. It took a while to develop this policy: numerous 
entries in early parts of the alphabet in the first edition are like amplifying vbl. sb. which 
has an editorial note ‘(Now mostly gerundial.)’ and a final gerundial (i.e. verb) example 
‘1765 R. Lowth Let. to Warburton 86 He sets out with a formed design of amplifying his 
subject’ 


13.11 SIMPLICITY AND COMPLEXITY 


The vast majority of lexical items in English can be described in fairly simple grammati- 
cal terms in a historical dictionary. On the whole, OED2 is successful at this and suf- 
fers mainly from inconsistency, much of which is due to the twin factors of insufficient 
evidence and very protracted compilation by relatively independent editorial groups. 
OED3 has been able to benefit from three things: the power of the computer to enable 
its editors to analyse the grammatical descriptions employed in OED2 and to codify and 
control their own; the almost limitless supply of evidence available; and the work (espe- 
cially that which is on a historical basis) of English grammarians since the time of the 
first edition. The relatively small number of words with a really complicated grammati- 
cal history—mainly major grammar words, such as modal verbs—were less successfully 
handled by OED2, again mainly because of shortage of data. OED3’s editors are able to 
examine large quantities of examples, typically in KWIC (key-word-in-context) format 
arranged chronologically, and thereby obtain a clearer picture of the distribution of the 
different uses of the word under consideration. However, this does not always make for 
simplicity. For example, it has been possible to research in detail the variations in the 
modal uses of need v,” (Branch IV.)—with or without verbal inflections, with or without 
the to-infinitive, with regular (do not need) or modal verb (need not) negation—but the 
resulting complex presentation requires a degree of concentrated attention on the part 
of the user. It is inevitably a challenge to communicate multidimensional information of 
this sort within the two dimensions of the dictionary entry. But in the majority of entries 
the step-by-step presentation of syntactic development is clear. 
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14.1 INTRODUCTION 


THE key defining characteristic of a historical dictionary is that it presents the histo- 
ries of individual words over time, grouping together material that shows a shared or 
common historical development, and presenting in separate entries material that shows 
a distinct history. (For discussion of full or limited application of similar grouping of 
material in much synchronic lexicography, see Koskela, this volume.) This apparently 
simple criterion for the division and structuring of material in a historical dictionary in 
fact brings with it many challenges and judgement calls for the lexicographer, some of 
the main categories of which are identified and illustrated in this chapter. 

The basic unit in a historical dictionary is often characterized as being a presentation 
of ‘the history of a word’; see discussion of the history and development of this concept 
in Considine (this volume). The same is also true of many of the largest and finest ety- 
mological dictionaries: see Buchi (this volume). However, the concepts ‘a word’ and ‘its 
history’ are both abstractions, and it is important that both the authors and the users of 
historical dictionaries do not lose sight of this. The present chapter will examine some of 
the tensions inherent in these concepts, particularly through the lens of ‘words’ which 
show historical discontinuities or processes of historical split or merger, and which 
hence pose particular problems for the conception of a historical dictionary entry as 
‘the history of a word’ It will do so by close examination of some examples taken from 
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the Oxford English Dictionary, a historical dictionary in which work on the origins and 
subsequent historical development of words is particularly closely integrated." 


14.2 ETYMOLOGY, WORD HISTORY, 
AND HOMONYMY 


A fundamental task in historical lexicography is the grouping together of lexical mate- 
rial which appears to show a common origin, and then (in most historical dictionar- 
ies) arranging this material within a structure which reflects as closely as possible the 
likely historical development of each of the main meanings and characteristic patterns 
of usage. In a historical dictionary (as also for most historical linguists) questions of 
homonymy are closely tied together with questions of etymology: identical word forms 
in the same word class will normally be regarded as showing the same word if they show 
a common origin. (Compare the different approaches considered by Koskela, this vol- 
ume, On difference in word class, see Weiner, this volume, Koskela, this volume.) 

In practice, distinctions between homonyms vary between on the one hand cases 
where both etymology and meaning are clearly differentiated (for instance, file ‘metal 
tool’ of Germanic origin, and file ‘set of documents, of French origin, which referred 
originally to the wire on which a collection of documents was kept),’ and on the other 
hand cases where either or both of these distinctions are more difficult to establish, and 
lexicographical decisions are less straightforward. 


14.2.1 An Example in Detail: Twenty-two OED Entries 
with the Headword Spelling post 


We can observe some of the results of historical lexicographers’ decision making if we 
take as a practical example the OED’s twenty-two separate entries with the headword 
spelling post (or in one case post-). The OED has twelve separate noun homonyms spelt 
post, plus six verb entries, two adverbs, and one preposition, in addition to the prefix 
post-, with a hyphen in the headword form; all are full homonyms, identical in pro- 
nunciation as well as in written form. The following are very brief summaries of what 
I would take to be the most common meaning of each: 


1 For an interesting recent appraisal of the OED’s place among historical dictionaries and also in the 
broad field of etymological lexicography, see Considine (2013). On etymological dictionaries sensu 
stricto, i.e. dictionaries the primary purpose of which is to convey etymological information, see Buchi 
(this volume). 

2 For an explanation of basic etymological methodology using this pair of words as its starting 
point, see Durkin (2015); on etymological methodology in general, especially as applied in historical 
dictionaries, see Durkin (2009). 
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¢ post n.' ‘post, pole, stake’ 

° post n.? (in Law) ‘the time after’ 

¢ post n.} ‘delivery of postal matter’ 

¢ post n.*, the name of a card game 

¢ post n.° ‘office or position to which someone is appointed or stationed’ 
° post n.°, the name ofan intoxicating beverage 

° post n.’ (in bookkeeping) ‘entry in a ledger’ 

« post n.®, a term in papermaking 

* postn.° (in bookkeeping) ‘extra or additional entry’ 

+ post n.’ ‘bugle call’ 

¢ post n." ‘postgraduate’ 

+ post n.? ‘post-mortem’ 

+ post v.' ‘to attach to a post; etc. 

° post v.? ‘to send via a postal service; ‘to travel in the manner of a post-rider’ etc. 
° post v. ‘to station in a particular location, to appoint to a particular position’ 
* post v.‘ ‘to pay down, to provide as security, etc. 

¢ post v.° ‘to trample (laundry) in water’ 

* post v.° ‘to perform an autopsy on (a body)’ 

* post adv.’ ‘with post-horses, by means of the post; with speed or haste’ 
° post adv.? (in legal documents) ‘later in the same document’ 

* post prep. ‘subsequent to, later than; following, since’ 

* post- prefix ‘afterwards, subsequent, later than, situated behind’ 


Only one of these words (post n.’) is marked obsolete (and it overlaps in time with most 
of the other homonyms), although several others are rare, and post n.” is now restricted 
to historical contexts. 

OED’s post n. is a large dictionary entry describing a number of distinct senses devel- 
oped over time. It shows an Old English borrowing of Latin postis ‘doorpost, post, pole, 
stake, and the historical record suggests that there was probably a continuous history of 
use from Old English to later times, although the possibility cannot be entirely ruled out 
that all later use reflects (re-)borrowing of Latin postis and/or (Anglo-)French post in 
the Middle English period.’ The material in this entry all shows as its ultimate semantic 
point of departure ‘support or column of timber or (later) some other strong material’ 
and/or ‘doorpost; although some of the later meanings (such as specific uses in geology 
or mining or in sports such as basketball) result from several stages of semantic develop- 
ment, and the relationship may well be opaque to some language users. 

post n.? ‘the time after’ is very restricted in register, found only in Law (and now only 
in discussions of legal history), ultimately reflecting a specific use of Latin post ‘after’ in 
the wording of writs. 


> On difficulties of this sort in the history of English words of Latin and/or French origin see Durkin 
(2014: 129-32, 251-53). 
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post n.} is, like post n., another major dictionary entry, with numerous senses, and 
covering a long chronological span, from the early 1500s to the present day. It ultimately 
reflects Italian posta ‘stopping place; a use as noun of the feminine of the past partici- 
ple of porre ‘to place’ although examination of the historical evidence suggests that it 
entered English via French poste. It shows a number of different meanings, both abstract 
and concrete, relating to the delivery of postal matter, as well as extended uses (e.g. in 
the titles of newspapers or in computing). Its semantic development is closely related to 
the external, real-world historical development of postal services, and it shows numer- 
ous points of contact with the development of French poste, Italian posta, and words 
in other European languages also ultimately borrowed from these (e.g. German Post, 
Dutch post) in complex patterns of mutual influence that are difficult to establish with 
any degree of certainty. 

These first three homonyms are completely distinct etymologically: even their remote 
etymons show no connection. By contrast, post n.4, the name of a card game, does share 
a remote etymon with post n.%, both being ultimately from Italian porre; however, the 
two are kept separate, since their immediate etymons are different (that of post n,‘ prob- 
ably being Italian posta ‘stake in a game’), and it is unlikely that any connection was felt 
between the two even at the time of their earliest use in English. 

Similarly, post n.° ‘office to which a person is or may be appointed, ‘place where mili- 
tary personnel are stationed or positioned, another important word in modern English, 
again shares Italian porre as its ultimate etymon. Like post n.°, it has a French form poste 
as its immediate etymon, but the two French forms are distinguished in gender, as are 
their Italian etymons, the Italian noun in this case being posto ‘place assigned. Although 
this is semantically not very remote from posta ‘stopping place, the distinction in mean- 
ing between the nouns in French and Italian seems consistently maintained, and, most 
importantly, there seems to be no confusion of meanings in English, therefore the sepa- 
ration of post n> and post n.° in OED seems unproblematic. 

post n.°‘an intoxicating beverage made by steeping poppy heads in warm water’ shows 
a seventeenth-century borrowing from Urdu. 

post n.’ ‘entry ina ledger’ (in bookkeeping) brings us back to the extended word fam- 
ily of Italian porre. Like post n.°, it corresponds to a use of Italian posta, but the English 
use more probably reflects conversion from one of the set of verb homonyms in English 
(post v.”), hence it is presented as a separate dictionary entry. post n°, a term in paper- 
making, also belongs ultimately to the same family, but its route into English was prob- 
ably via German Post, itself ultimately showing a specific use of the German equivalent 
of post n.? 

The very rare post n.°, again from bookkeeping, shows an ellipsis for post-entry, which 
itself shows the prefix post-, and hence has no etymological connection with post n.’; the 
two are therefore presented in separate entries in OED, even though they are homonyms 
belonging to the same specialist field of discourse. Like post n.°, post n." and post n.” also 
show ellipses or clippings, from postgraduate and post-mortem respectively. 

The remaining item in OED’s set of twelve noun homonyms, post n."° ‘bugle call’ 
(most typically found in ‘last post’) is a difficult case for a historical dictionary, calling 
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for a judgement call from lexicographers. It probably arose from a specific use of post n.° 
in military use, perhaps in a phrase such as ‘call to post’; however, certainty on this point 
is elusive. It therefore seems the safest solution to place this in a separate entry, with 
comments at each entry drawing attention to the fact that they may show developments 
of the same word history. In a case like this it is impossible to avoid the nagging suspi- 
cion that some further evidence could result in this being placed confidently under post 
n.°, hence bringing together branching pathways of semantic change and enriching the 
documentation of that word's historical development. 

Taking the verbs more briefly, post v.! and post v.’ are both substantial dictionary 
entries for semantically rich verbs which originally show conversions from post n.' and 
post n.? respectively. There are several points where the distinctions between homonyms 
become rather less clear than in the case of the nouns. Notably, post v.' shows a number 
of senses (the earliest dating back to the 1600s) relating to posting up information of 
various sorts, originally conceptualized as attaching information to a post on a plac- 
ard or notice. The extension of this (long conventionalized) strand of meaning into the 
world of computing, describing the posting of messages to mailing lists, etc., leads to 
some ambiguity with post v.2, which has as its core a number of uses relating to postal 
services, in modern use especially to post a letter, with analogous uses in other fields, 
suchas in bookkeeping or accounting, or in computing, where data is described as being 
posted to or into a particular location in a data structure. This assignment of particu- 
lar conventional uses to either post v.' and post v.2 depends upon the construction of a 
narrative of likely organic historical development for each word: senses are assigned on 
the basis of their likeliest fit in a process of diachronic semantic change. However, use 
of post in computing to describe the sending of messages or data to electronic mailing 
lists appears to show plausible points of attachment with both of these word histories; 
it probably developed primarily from earlier uses of post v.', but very likely also shows 
influence from uses post v., and here OED provides an explicit note on the likely cross- 
currents of influence: 


Computing. To send (a message or data) to a mailing list, newsgroup, or other online 
forum on which it will be displayed; to display or make available online. Also intr. 


This sense shows the influence of post v.’, esp. in those instances when the to construc- 
tion is used and when the message is sent only to people on a specific mailing list, rather 
than being displayed on an open-access site. 


This sort of merger or near-merger places strain on the structures ofa historical diction- 
ary: it is clear that post v.! and post v.” have different origins, but both homonyms have 
probably influenced the development of this particular use, and the most that the his- 
torical lexicographer can do is to try to identify and document the different influences at 
play, as in the note here. 

The next two post homonyms present a different sort of complication: they are not def- 
initely of distinct origin from one another, but presentation as two separate dictionary 
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entries makes for a clearer presentation of complex histories. post v.° is a conversion 
from post n.°, with meanings relating to stationing something in a particular or strategic 
position, or appointing someone to a particular position or posting. post v.* shows the 
meanings (in slang use) “To lay down, stake, deposit, pay down’ and “To pay or provide 
as bail or security; to pay (bail)’; it could, like post v.*, also show conversion from post n, 
but input directly from Italian posta ‘stake in a game’ (probably the origin of post n.*, the 
card game) is also possible, and some association with post n.’ ‘entry ina ledger’ is also 
conceivable. This uncertainty about the origin of post v.’, as well as the semantic distance 
between the meanings at the two entries, provides good grounds for keeping post v.’ and 
post v.‘ separate, even though they may well in fact show a shared origin. 

post v. and post v.° are both relatively rare and clearly distinct from other homo- 
nyms: post v.’ is a Scots laundry term, originating asa variant of poss in similar use, while 
post v.®, like post n.2, shows an ellipsis for or clipping of post-mortem. 

post adv. ‘with post-horses, by means of the post, express; (hence) with speed or haste’ 
originates from phrasal uses of post n.3, while post adv. (like post n.’) reflects uses of 
Latin post ‘after’ in legal documents. 

post- prefix (from Latin post-) shows not a lexical item but a word-forming element; 
its inclusion as a historical dictionary entry permits minor formations in post- to be 
grouped together economically in one place (while more significant ones have full 
entry status); it also enables the historical lexicographer to summarize in one diction- 
ary location the history of such formations within English and the major patterns of 
foreign-language influence. The inclusion of such elements as headwords involves some 
(time-honoured) bending of category distinctions for beneficial pragmatic reasons; it 
also enables a dictionary to reflect well the rather fluid lexical development shown by 
items such as post preposition, a lexical item with the meaning ‘subsequent to, later than; 
following, since’ which arose by analogy from uses of the word-forming element post-. 


14.2.2, Some General Observations Drawn from This 
We can draw some general observations from this exploration of post homonyms: 


¢ Most historical dictionary entries depend upon establishing a pattern of branch- 
ing historical development from a shared origin, and this is the typical shape of the 
history ofa word. (To what extent this is typically an abstraction from much more 
complex patterns of diffusion and spread will be examined in Section 14.3.) 

¢ Nonetheless, it is not uncommon to find patterns of influence both among ety- 
mologically related clusters of words, and between words that are etymologically 
unrelated (as between post v.! and post v. here), and the structures of historical 
dictionaries need to be flexible enough to reflect these. 

+ Identity of the remote as opposed to the immediate etymon is of little structural 
importance (e.g. the fact that several of these words have Italian porre in their 
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remote ancestry), although this can be interesting information for many readers 
and is typically encoded in a dictionary’s etymologies; it also often correlates with 
some degree of semantic similarity, and hence with a higher than average likeli- 
hood of mutual influence between words. 

« In some cases it will be uncertain whether particular uses do or do not show a 
shared origin, and, whatever decision is taken by the historical lexicographer on 
treating them as one or more different dictionary entries, the difficulties must be 
clearly flagged (as in the case of post n.° and post n."°, or of post v. and post v.’). 

¢ In most historical dictionaries, there will be some entries which do not correspond 
at all closely to a single lexical item (by most definitions), entries for word-forming 
elements such as post- being a rather extreme case, but these play an important part 
in a dictionary’s efficient presentation of key information about individual word 
histories. 


14.3 WORD HISTORIES THAT CHALLENGE 
THE NOTION OF ORGANIC HISTORICAL 
CONTINUITY FROM A SINGLE POINT OF ORIGIN 


In Chapter 10, Considine examines some of the stresses inherent in the revised OED entry 
for historical. Itis not in doubt that the OED is here charting the development of an important 
English lexeme over time; what is a matter of real uncertainty is how much each of its senses 
owes to borrowing from Latin historicus, or to analogy with the development of the near 
synonym historic, or to influence from French historique or related words in other European 
languages, or to the influence of the related noun history or to other members of a somewhat 
extended word family. Furthermore, individual uses in each of its separately demarcated 
senses may themselves show a differing set of influences; before a particular meaning or pat- 
tern of use has gained general currency, it is quite possible that different instances of similar 
worduse may reflect very different trajectories of development and influence. 

To some extent, such phenomena reflect the difficult balance of the ‘macro’ and the 
‘micro level in lexical history that is ever present in historical lexicography: most state- 
ments about the development of words over time are abstractions from a complex set 
of data, open to challenge on numerous points of detail when investigated closely; the 
historical lexicographer has constantly to chart a difficult course between catalogu- 
ing a confusing welter of fine details and multiple possibilities in which even the most 
dedicated reader may easily become lost, and presenting an overly smoothed, stream- 
lined presentation in which one explanatory narrative is favoured over all others, and in 
which significant strands of influence and development may be entirely neglected. The 
danger is perhaps nowhere greater than in the temptation to see each dictionary entry 
as a self-enclosed entity, neglecting the position of each lexeme within a set of lexical 
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relations within the lexicon as a whole, and also often with equivalent words (often, but 
not always, cognates) in other languages. 

Frequently, even the ‘macro’ picture cannot be presented faithfully without describ- 
ing multiple inputs, either within a single historical period, or at different stages in a 
word’s development. The noun culture shows both of these factors rather clearly. It is first 
recorded in English in the 1400s. Like very many other words borrowed into English dur- 
ing the Middle English period, especially in the key period around 1400 when English 
was rapidly taking on new functions in various official and technical fields in both spoken 
and written use,’ culture almost certainly shows some input from both French and Latin. 
At this date it referred only to cultivation of the soil, but during the 1500s various uses 
connected with the education of an individual person begin to be found in English, again 
showing borrowing or influence from French and Latin. The other major element in the 
long-term ‘macro’ level history of the word culture in English comes from German. In 
German, the word Kultur was borrowed from French during the 1600s. However, a major 
development in the use of the word occurred in German during the 1700s, when Kultur 
came to refer not only to the education of the individual but to the (perceived) state of 
development of a whole society. This development in the use of the word in German 
had a huge impact on how the word was also used in English and in French. During the 
subsequent centuries, there has continued to be considerable mutual influence between 
English culture, French culture, and German Kultur. In a case like this, the etymology 
section of the OED entry provides a useful location for bringing together commentary 
on this complex set of foreign-language inputs, keyed to particular sense numbers in the 
main part of the entry, and also integrating some discussion of the complex, and chang- 
ing, set of relationships with semantically related terms such as civilization and society: 


OED3, culture n., etymology section (slightly abbreviated): 


< Anglo-Norman and Middle French culture (French culture ) action of cultivating 
land, plants, etc., husbandry (12th cent. in Anglo-Norman), (piece of) cultivated land 
(12th cent. in Anglo-Norman), formation, training (13th cent. in Anglo-Norman), 
worship or cult of someone or something (14th cent. or earlier in Anglo-Norman), 
cultivation, development (of language, literature, etc.) (1549), mental development 
through education (1691), intellectual and artistic conditions of a society or the (per- 
ceived) state of development of those conditions, also the ideas, customs, etc. of a 
society or group (1796, after German Kultur) and its etymon classical Latin cultura 
cultivation, tillage, piece of cultivated land, care bestowed on plants, mode of grow- 
ing plants, training or improvement of the faculties, observance of religious rites 
(and cent. A.D. in this sense), in post-classical Latin also rites (Vetus Latina), venera- 
tion of a person (late and or early 3rd cent. in Tertullian), training of the body (sth 
cent.) < cult- , past participial stem of colere to cultivate, to worship (see CULT 1.) 
+ -tira -URE suffix’. In branch IIL, and especially in senses 6 and 7, also influenced 
by German Kultur, both directly and via French. The German word is a 17th-cent. 


4 See detailed discussion of this phenomenon in Durkin (2014; 223-80). 
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borrowing < French, but the transfer of the meaning ‘state of intellectual develop- 
ment’ from an individual to the whole of a society occurred in German in the mid 
18th cent.... 


The sense development of the word in branch III, from the 19th cent. onwards is 
very complex; the term is frequently distinguished from CIVILIZATION n, (and to 
some extent also from socigTy n.), although the precise distinctions made differ 
greatly. In one important tradition originating in Germany in the 18th cent., the term 
is used to denote the (perceived) state of development of the intellectual life of a soci- 
ety (compare sense 6), but this was challenged (already before the end of the 18th 
cent. by the German philosopher Herder) by another (countable) use with reference 
to the ideas, customs, etc. of a society or of a group within a society (compare sense 
7); this has frequently been used in the context of rejection of normative or hierarchi- 
cal conceptions of the development of society, and hence with loss of the previously 
transparent connection with earlier senses at branch III. Additionally, in modern use 
in sense 6 the term is frequently used as a general term to denote the arts and other 
aspects of intellectual life, without any special reference to their historical develop- 
ment (nor to their connection with any particular society), and hence again with less 
transparent connection with earlier senses of the word. For an account of this pro- 
cess see R. Williams Keywords (1976) at culture. 


Contested uses of the term are often flagged, and mocked, by the use of altered 
spellings, for which see CULCHA n.... 


In the case of culture, there is no real doubt that we are dealing with the history of a 
single, polysemous, ‘word’: the challenge is in presenting the major influences and the 
most important inter- and intra-linguistic influences as clearly as possible within the 
constraints of a historical dictionary entry. 

In some other cases, there is very real doubt about the boundaries of particu- 
lar word histories, and about whether particular forms, meanings, and uses show a 
shared origin. A particularly difficult case is presented by modern English road. This 
word in the form road and in the meaning (very broadly) ‘path or way’ is first docu- 
mented in the late sixteenth century, and the traditional assumption has been that this 
shows a development from Old English rad (broadly) ‘action of riding’, which itself 
is ultimately from the same Germanic base as the verb ride, and which survives in 
other meanings in forms broadly similar to those of road down to the modern English 
period.* (Modern English raid is also developed from Old English rad, reflecting 
a northern and Scots form; in the teleological perspective of OED it is treated in a 
separate entry, the etymology of which begins with a statement that it is historically 
a variant form of road: compare Section 14.5 on splits of this type.) However, mat- 
ters are complicated considerably by the existence in Scots of forms of the type rode 


> There is some doubt whether the meaning ‘place where ships ride at anchor’ shown by the same 
range of forms in English and in some other Germanic languages does in fact show the origin as the word 
meaning ‘action of riding; but this is not centrally relevant to the discussion of the word meaning ‘path or 
way’: see further Durkin (2013). 
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is likely that there have been multiple ‘coinages; as different speakers have felt the same 
need and (in most cases unconsciously) filled that need by making use of the same lexi- 
cal resources, whether that should be the internal word-forming resources of English or 
the lexicon of another language in the case of lexical borrowing. Historical dictionar- 
ies can seldom pinpoint these multiple inputs that contribute in varying degrees to the 
eventual general currency of a new word (or indeed of a new meaning of an existing 
word), but itis very likely that the apparently ‘unusual’ cases discussed in this section are 
really just those where the data confront most brutally the assumption that word histo- 
ties are at all likely to be clean, simple, or uncomplicated. 

The remaining two sections of this chapter look at two particular types of word his- 
tory, those involving lexical mergers or lexical splits, which pose special challenges for 
the structures of historical dictionaries.’ 


14.4 LEXICAL MERGERS AND 
THE CHALLENGES THEY POSE 
FOR HISTORICAL DICTIONARIES 


Some word histories show the merger of what were in an earlier stage of linguistic his- 
tory two distinct words, distinct in form or meaning or both, and showing distinct ety- 
mologies (either entirely separate origins, or distinct pathways from a shared point of 
origin). 

In some instances of merger, the meanings formerly realized by two distinct word 
forms come to be realized by a single word form. For instance, a fairly common pat- 
tern in the history of English is that a formally distinct causative verb becomes merged 
with the non-causative verb from whose root it was ultimately formed. Thus, the reflex 
of Old English meltan (transitive) ‘to melt (something)’ (weak verb, past tense mielte) 
merges in Middle English with the reflex of its ultimate parent meltan (intransitive) 
‘to melt’ (strong verb, past tense mealt); both verbs come to show regular weak inflec- 
tions, with regularized past tense melted, and hence modern English has a single verb 
melt (past tense melted) which means both ‘to melt’ and ‘to melt (something). (From 
a theoretical perspective, this process could be construed differently, without invoking 
merger: for instance, it could be argued that one verb has become obsolete while the 
other has acquired a new meaning. However, the practical challenges for a historical 
lexicographer remain the same, in that words that were formally distinct in parts of their 
paradigm have been replaced by identical word forms realizing both sets of meanings.) 
For a historical dictionary like the OED, with a teleological perspective anchored in con- 
temporary English, a practical solution is to treat all of this material in a single entry, 


? For further discussion of both lexical mergers and splits and the challenges they pose for historical 
dictionaries see Durkin (2006, 2009: 79-88). 
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with clear flagging in the etymological discussion and in the presentation of historical 
forms that originally distinct verbs have merged over time. 

Many instances of merger are rather more messy than melt. Modern English mare 
‘female horse’ is formally developed from Old English mearh ‘horse (of either sex)’; its 
change in meaning is one of the outcomes of a complex process of merger between this 
word and its feminine derivative mire, myre ‘female horse. From approximately 1300 
onwards, formal reflexes of both mearh ‘horse (of either sex)’ and mire, myre ‘female 
horse’ are found in the meaning ‘female horse’ (and in various figurative uses); while 
in etymological terms we may be able to distinguish for instance the Scots form mere 
as formally a development from mire, myre and the standard form mare as formally a 
development of mearh, the convergence in meaning of both sets of forms as ‘female 
horse’ argues strongly for a combined entry in a historical dictionary anchored in con- 
temporary English. Thus in OED all of this material is presented together in a single 
dictionary entry mare, in which the various form types are catalogued and their varied 
origins explained. 

Rather greater lexicographical challenges can be posed by cases of partial merger, 
where polysemous words of distinct origin come to show semantic overlap in one or 
more of their meanings. OED has major entries for two polysemous words both of 
which have the form mean in modern English; one is a French loanword, borrowed in 
the Middle English period, with the core meanings ‘intermediate, intermediary’ and 
(hence) ‘moderate, middling, average’; the other is a word of Germanic origin, and in 
early use shows primarily the meaning ‘possessed jointly, common, although there are 
some indications that a meaning ‘minor, inferior in degree’ may already have existed 
before the Romance word was borrowed into English. In Middle English and later, 
‘inferior in rank or quality, unpleasant’ comes to be a common meaning of mean; this 
probably developed from the word of Germanic origin (compare the similar semantic 
development shown by common), but there was probably a degree of convergence with 
use of the Romance-derived word in the meaning ‘(only) middling’ In a case like this, 
a historical dictionary has little option but to ‘plump’ for placing particular uses under 
one homonym or the other, and flagging prominently that the exact line of development 
is unclear, and some degree of merger or loss of distinction between etymologically dis- 
tinct homonyms is likely. Compare similarly the case of post v.' and post v.” discussed in 
Section 14.2.1. 


14.5 LEXICAL SPLITS AND THE CHALLENGES 
THEY POSE FOR HISTORICAL DICTIONARIES 


An obvious challenge for the structures of historical dictionaries is presented by lexical 
material that at one stage in the history of a language constitutes a single word, undis- 
tinguished in form and with a single etymology, but which subsequently shows a split 
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into two distinct forms, meriting separate treatment. This is especially the case for dic- 
tionaries like the OED (or most other ‘national’ historical dictionaries) which have an 
essentially teleological perspective, working from early times up to the present, with 
headword forms anchored in the present day. In contemporary languages with an exten- 
sively developed standardized variety and corresponding orthographic norms, such dis- 
tinctions are normally signalled by distinct written forms, as between metal and mettle 
explored in detail in the following paragraph. In some cases splits may also be signalled 
by distinct morphological or syntactic behaviour, as in the case of media, historically 
the plural of medium, or by distinctions in pronunciation, as between recreate ‘to create 
again (with long high vowel in the first syllable) and recreate ‘to reinvigorate, refresh, to 
amuse oneself’ (with short mid-height vowel in the first syllable, like recreation). 

Metal is a thirteenth-century borrowing into Middle English from Anglo-Norman 
and Old French (with maybe also some input directly from Latin), Its earliest and core 
use is to denote (from a modern scientific perspective) any of the metallic elements, or 
these elements taken collectively (defined by OED as ‘hard, shiny, malleable material of 
the kind originally represented by gold, silver, copper, etc.,... esp. as used in the manu- 
facture of objects, artefacts, and utensils’). It is also used in a range of other meanings 
developed from this core meaning, including, in the sixteenth and seventeenth centu- 
ries, the broadened meaning ‘material, matter, substance, fabric. Closely related to this 
are the figurative meanings defined by OED as‘ person's character, disposition, or tem- 
perament; the “stuff” of which one is made, regarded as an indication of one’s charac- 
ter’ and ‘A person's spirit; courage, strength of character; vigour, spiritedness, vivacity: 
These are very similar in motivation to modern English figurative uses of the words sub- 
stance or fibre. The word shows a wide variety of spelling forms in Middle English and 
Early Modern English, including metal, metell, mettle. In modern English, the figurative 
senses noted above survive largely (but by no means only) in certain idiomatic phrases, 
especially to show one’s mettle. In modern use, this almost always shows the spelling 
mettle, while the spelling mettle never occurs (except as an occasional spelling error) 
for other uses; hence metal and mettle are listed as separate lexical items in synchronic 
dictionaries of modern English. Because the OED’s default perspective is that of con- 
temporary English (taking a broadly teleological approach), modern uses of metal and 
mettle are also placed at separate dictionary entries in the OED. However, this brings 
with it some challenging questions for a historical dictionary. In Early Modern English, 
the meanings listed above are not distinguished in spelling, and there is no evidence 
that they were perceived as different words by contemporaries. (There is no evidence 
that metal and mettle have ever been distinguished in pronunciation as opposed to in 
spelling.) There are frequent puns on the different meanings in Early Modern writers. 
Compare the two following instances, both from Shakespeare's Julius Caesar: 


See wheer their basest mettle be not moved. (1.i.61.) 
Well, Brutus, thou art noble, yet I see 

Thy honourable mettle may be wrought 

From that it is disposed. (Lii.305-7.) 
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Or this from Henry IV Part Two: 


For from his (sc. Hotspur’s] metal was his party steeled, 
Which once in him abated, all the rest 
Turned on themselves, like dull and heavy lead. (I.i.116-i8.) 


However, there is no reason to assume that these are anything other than puns on dis- 
tinct meanings of a single polysemous word. (To the extent that contemporary speak- 
ers make any clear distinctions between homonyms and polysemous words: compare 
discussion in Koskela, this volume.) A distinction in spelling begins to be found with 
any consistency only from the early eighteenth century. (The two quotations above from 
Shakespeare show modernized spellings, as do most modern editions of his works, In 
the early Quartos and Folios the spelling in fact varies considerably, and there is no con- 
sistent distinction between ‘metal’ and ‘mettle’: see OED3’s etymology section quoted 
later in this section.) 

In a case like this, a historical-etymological approach and the teleological perspective 
of contemporary English pull in different directions. The (almost) universally applied 
spelling distinction between metal ‘metallic substance’ and mettle ‘a person's character, 
disposition, or temperament reflects a perception that these are distinct homophones 
in contemporary English, and no synchronic dictionary of modern English will have 
any difficulty in deciding to make them separate dictionary entries. It likewise seems 
reasonable that a historical dictionary like OED witha teleological perspective anchored 
in contemporary English will have two separate entries, metal and mettle. On the other 
hand, there is no doubt that they show a shared origin, and what limited evidence infer- 
ences we can make about perceptions in previous stages in the history of the language 
suggest that speakers did not perceive these to be separate words. The key difficulty for 
OED is therefore how to distribute material between the two entries. 

One approach could be to try to identify a point in time at which some writers began 
to make a clear distinction in spelling. Numerous English dictionaries, both monolin- 
gual and bilingual, survive from the Early Modern period onwards, and we might there- 
fore look to these for help. Searching mettle in Ian Lancashire's Lexicons of Early Modern 
English (LEME) database suggests that John Kersey’s New English Dictionary of 1702 is 
the earliest to clearly distinguish the two groups of meanings on the basis of spelling. 
Kersey has the following two entries, in their respective alphabetical places in the word- 
list, two pages apart in his dictionary: 


A Metal, diggd out of the earth, as gold, silver, &c. also the breech of a great gun. 
Mettle, vigour, or sprightliness. 


A similar approach is taken in the sixth edition (1706) of Edward Phillips's New World 
of Words (which was edited by John Kersey), and also by Nathan Bailey's Universal 
Etymological English Dictionary of 1721. However, Samuel Johnson's Dictionary of the 
English Language includes a sense defined ‘courage; spirit’ under the headword metal as 
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well as separately under the headword mettle. Johnson's entry mettle has the etymologi- 
cal comment ‘corrupted from metal, but commonly written so when the metaphorical 
sense is used, while the corresponding sense at metal has the comment ‘in this sense it 
is more frequently written mettle. Johnson does not make any explicit comment about 
diachronic change, but his approach is probably motivated by his extensive use of quo- 
tations from authors of the preceding century (and earlier): in this instance, both the 
sense at metal and the entry mettle are illustrated by quotations from a writer of the 
seventeenth century, Edward Hyde, first earl of Clarendon (in addition to other authors, 
including Shakespeare, in the case of mettle). 

Given this situation, it would thus be possible to place all evidence up to 1702 at the 
entry metal, and all evidence for senses now (almost) always spelt mettle at the entry 
mettle, on the grounds that this is the point at which dictionaries begin to recognize 
and codify the distinction. (Or, in a variant of this approach, all evidence for the figu- 
rative senses from 1702 onwards spelt mettle could be placed at mettle, but any exam- 
ples that retain the spelling metal for these senses could be placed at metal.) Although 
feasible, this would place a good deal of weight on the evidence of a cluster of early 
eighteenth-century dictionaries, which might well not accord precisely with what 
was suggested by a close analysis of spelling frequencies in large text databases; in 
this particular case, both Early English Books Online (EEBO) and Eighteenth-Century 
Collections Online (ECCO) could be drawn on to help provide an impression of fre- 
quencies either side of 1700. It would also be a solution that would not be so easily 
applied in periods for which the data are less rich, and also for earlier periods in which 
consistent graphic distinctions between homophones are less commonly found unless 
etymologically motivated. 

The solution that is instead generally adopted by the OED is one that is applicable in 
a much wider set of cases (including splits in pronunciation not clearly flagged by most 
historical records, as in the case of recreate), and one that is also applicable, in outline, 
for most other historical dictionaries of modern languages with long recorded histories. 
The split that is made is semantic, with those senses that are today normally realized 
by mettle placed at the entry mettle, and those which are spelt metal at metal. Thus, the 
entry mettle includes examples (in the sense ‘A person's character, disposition, or tem- 
perament’) with the spelling metal, as shown by its listing of forms (with ‘15’ meaning 
‘15008, ‘15—‘ meaning ‘1500s to the present day; etc.): 


OED3, mettle n., listing of spelling forms: 

Forms: 15 mettal], 15-16 metall, 15-16 mettal, 15-16 mettell, 15— mettle, 16— metal. 
The etymology section presents a fairly detailed account of the spelling history (and 
residual continuing variation), keyed in to particular senses and particular illustrative 


quotations in the main body of the entry: 


OED3, mettle n., etymology section: 
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[Originally a variant of METAL n., now usually distinguished in form in the senses 
below. The form mettle was a variant spelling used in all senses in the 16th and 
17th centuries; in the figurative senses documented here, there are many cases of 
-al spellings and -le spellings occurring in the same contexts within a single work 
(compare quots. 1642 at sense A. 1 and 1642 at Phrases 2; for variation between the 
Quarto and Folio editions of Shakespeare see quot. 1598 at sense A. 2a), with the 
result that quots. such as 1604 at sense A. 2a should be regarded as punning not on 
two words but on two senses of a single word. The first dictionary to record the figu- 
rative senses under the spelling mettle separately from metal is Kersey’s New Eng. 
Dict. (1702), By the mid 18th century the form mettle becomes very rare in non-fig- 
urative senses, although metal persists in senses A.1and A. 2a. A distinction in pro- 
nunciation is unlikely in any period (compare E. J. Dobson Eng. Pronunc. 1500-1700 
(ed. 2, 1968) II. $264).] 


It is also essential that the existence of this entry is flagged at the entry metal, in this 
case (since metal is a significantly more substantial dictionary entry for a much 
higher- frequency word than mettle) with a clear but discrete note at the end of the listing 
of spellings (which includes mettle as a spelling for the meaning ‘metallic substance’ and 
related meanings): 


OED3, metal n. and adj., listing of spelling forms: 


Forms: ME mace (transmission error), ME macel (transmission error), ME 
mataille, ME matall, ME matalle, ME mataylle, ME matel, ME metail, ME metaile, 
ME metaille, ME metayl, ME metayle, ME metele, ME metelle, ME mettaill, ME 
mettayl, ME-15 metel, ME-15 metell, ME~16 metale, ME~16 metall, ME-16 met- 
alle, ME-16 mettel, ME-17 mettal, ME~ metal, 15 meatale, 15 meatall, 15 meatalle, 
15 metale, 15 mettalle, 15-16 metle, 15-16 mettall, 15-16 mettell, 15-18 mettle, 16 mat- 
tell, 16 mettaile, 16 mettill; Eng. regional (north.) 17 mettle; also Sc. pre-17 matel, 
pre-17 matell, pre-17 mattell, pre-17 metale, pre-17 metall, pre-17 metalle, pre-17 
metell, pre-17 metle, pre-17 mettaill, pre-17 mettale, pre-17 mettall, pre-17 mettel, 
pre-17 mettell, pre-17 mettill, pre-17 17-18 mettal, pre-17 17~ metal, pre-17 17~ met- 
tle. See also METTLE n. and adj. 


14.6 CONCLUSIONS 


At the heart of every historical dictionary is the, conscious or unconscious, assump- 
tion that each of its constituent units, each dictionary entry, is the history of a word. 
Identifying a shared immediate origin is a crucial starting point, and if this can take the 
form ofa secure etymology (rather than the assumption lurking behind every statement 
of ‘etymology unknown’ that a common origin seems plausible but lacks the support 
of an established starting point), then most historical lexicographers will feel a certain 
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degree of comfort. However, words are units in complex systems, showing complex pat- 
terns of mutual influence, and historical dictionaries have to find flexible ways of iden- 
tifying, reflecting, and documenting at least the most important of these patterns; even 
‘external’ influence from words in other languages will often occur during the history of 
a word, and not only at the initial point of origin. The identification of homonymy in his- 
torical dictionaries depends fundamentally on whether two units identical in form do 
or do not have a shared origin, but the application of this basic principle must be flexible 
enough to reflect those instances where language users’ association of units of distinct 
origins leads to mutual influence or even merger. In other cases, uses that show a single 
origin can become entirely dissociated in speakers’ minds; when this is accompanied 
by the selection of a particular variant form realizing a particular meaning, a dictionary 
will encounter a considerable challenge to the working assumption that ‘one dictionary 
entry = one word history’ Additionally, the ‘origin’ of many (perhaps most) words may 
be better conceptualized not as radiation emanating from the Big Bang ofa single point 
of origin, but as the gradual coming together of multiple similar but distinct innovations 
each contributing to the emergence and growing establishment of a new lexical item. 
Such processes can rarely be traced in detail, still less documented in a dictionary, but an 
awareness of their likelihood can help lexicographers escape the danger of mistaking the 
necessary constrictions of dictionary form for the complex and often messy realities of 
the histories of words. 
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TANIA STYLES 


15.1 SETTING THE SCENE: PLACE NAMES 
AND GENERAL DICTIONARIES 


Ir is the policy of most general dictionaries not to include entries for place names, a 
convention which has prevailed in the lexicography of most of the major European lan- 
guages for the last two centuries. The names of geographical places usually appear as 
dictionary headwords only when they are used allusively or in extended senses: that is 
to say, insofar as they either serve as elements of ordinary words or phrases, or come to 
function as appellatives by themselves. So, for instance, the Oxford English Dictionary 
(OED) has an entry for SEVILLE n. to cover the compounds Seville oil and Seville 
orange, not the (Anglicized) name of the city in Andalusia, Spain; and the entry DErBy 
n. accommodates its use to denote a horse race, a sporting contest between two local 
teams, and a style of hat, but not the city in the east midlands of England. There are some 
exceptions to this rule—the Oxford Dictionary of English, for instance, since its first edi- 
tion in 1998 has aimed to treat ‘all those terms forming part of the enduring common 
knowledge of English speakers, regardless of whether they are classified as “words” or 
“names” ’ (2010: xi), anda similar policy has long been applied in some major dictionar- 
ies in the United States. However, for detailed linguistic information about toponyms, 
interested readers must still look beyond general dictionaries of a language to specialist 
dictionaries of place names.! 

There are sound reasons for this differential treatment, both theoretical and practi- 
cal. Place names, as proper nouns, have a linguistic status that sets them apart from the 
ordinary vocabulary of a language. For instance, whereas a common noun denotes a 
class of items with a set of shared characteristics that can be outlined in a dictionary 


’ Fora full discussion of the history of this tradition and the motivation for its development, see 
Marconi (1990). 
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definition, a place name works more as a label or signpost, a distinctive way of referring 
in speech or writing to a single geographical location. Although the semantic status of 
names remains controversial, it is generally agreed that they have reference (in the case 
of toponyms usually, and ideally, unique reference to a particular point on the map), but 
not sense (linguistic meaning that can be defined by alexicographer). It is also debatable 
whether place names can be said to ‘belong to’ the language in which they are embedded 
in the same way words do. Except in the case of a small set of major or culturally sig- 
nificant names—like Londres for London, or Vienna for Wien—or the names of places 
where more than one language is spoken, toponyms do not routinely have translation 
equivalents in different languages: the name used by the inhabitants or administrators 
of a locality is typically used by anyone who refers to it, regardless of the language they 
are using when they do so, and only minimally assimilated to the sound and writing sys- 
tem of that language. This means, in practical terms, that it would be impossible to list 
all the place names that might be used when speaking or writing a given language, since 
there isin principle no limit to this set. 


15.2 SYNCHRONIC DICTIONARIES 
OF PLACE NAMES 


The function of a synchronic dictionary of place names is effectively served by gazet- 
teers, which list alphabetically the officially sanctioned forms of names within a particu- 
lar geographical region, and identify each one with the settlement or feature it denotes 
by means of a grid reference. The fact that they operate as lexically meaningless labels 
in everyday use means that place names are relatively easily transferred between differ- 
ent groups of speakers. However, in some countries where more than one language is 
widely used, two forms of the same toponym will coexist: for instance, in South Africa, 
the same city is likely to be known as Cape Town in an English linguistic context but 
as Kaapstad in an Afrikaans one. Alternatively, a single place may be known by two 
(or more) entirely unrelated names, as in the case of Fort William (named in English 
with allusion to the fortress built here during the reign of William III) also known as An 
Gearasdan (Gaelic for ‘the garrison’), or Swansea (a name of Scandinavian origin mean- 
ing ‘Sveinn’s island’) also known as Abertawe (a name given in Welsh meaning ‘(settle- 
ment at) the mouth of the River Tawe’). This situation creates the need for a special class 
of gazetteer with a role analogous to that of the bilingual dictionary, giving the approved 
form and spelling of place names when using each of the languages concerned: examples 


2 An introduction to some of the theoretical issues surrounding the linguistic status of names is 
given in Lyons (1977: 219-33) and more recently the articles on proper names by Hanks (2006b), Reimer 
(2006), and Lehrer (2006) in Brown (2006); more detailed discussions are presented in Anderson (2007) 
and (with particular reference to place names) Coates (2006). 
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include the Gasaitéar na hEireann /Gazetteer of Ireland (1989), Canada’s electronic 
resource Geographical Names Approved in English and French (last updated 2006), and 
the online gazetteer of Scotland recently established by Ainmean-Aite na h-Alba / Gaelic 
Place-Names of Scotland (2010-). 


15.3 HISTORICAL DICTIONARIES 
OF PLACE NAMES 


Monolingual place name dictionaries are historical dictionaries, and their purpose is 
essentially etymological. While it is true that, to serve its purpose in everyday lan- 
guage, the name of any given place may as well have been assigned to it completely 
arbitrarily, in fact this is rarely the case. A place name typically starts out as a mean- 
ingful description of a place or of some attribute of it, made up of ordinary words in 
the everyday language of people who live there. Over time, one particular utterance 
becomes established as the single definitive way of referring to the place, the descrip- 
tive function becomes secondary to the purely denotative, naming function, and a 
place name is born, Eventually, thanks on the one hand to changes in the sounds and 
lexis of language, and on the other to changes in the nature, function, and appearance 
of the place named, the original ‘meaning’ or etymology of the name often becomes 
obscured, misinterpreted, or forgotten altogether. Place name dictionaries there- 
fore attempt to supply information on three fronts: what are the linguistic units that 
make up the name (its elements); what is the sense of the utterance they make up (its 
meaning); and why was this description originally applied to the place in question (its 
motivation)? 

The methodology used in place name research, and the dictionaries that result, nat- 
urally differ somewhat from country to country and from publication to publication. 
Although reference will be made to lexicographical practice in a number of countries, 
the analysis that follows draws examples primarily from research into the toponymy of 
England. 

Although place names had begun to attract academic attention in a number of 
European countries by the end of the nineteenth century, England’s place names 
became the subject of rigorous philological study relatively early, in scholarship aris- 
ing from the same background as the OED. Important articles on English place names 
were published by Henry Bradley between 1881 and 1885, and much of the practical 
methodology of the modern discipline was pioneered by Walter Skeat, in a series of 
county place-name surveys published between 1901 and 1913. This activity led to the 
founding in 1922 of the national Survey of English Place-Names and in the following 
year of the English Place-Name Society (=EPNS), whose members’ subscriptions pro- 
vided a source of funding for the survey work. The Survey has now reached a fairly 
advanced stage of completion, and methods broadly similar to those used by the EPNS 
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publications are now used in place-name studies in other Anglophone countries, as 
well as elsewhere in Europe.? 


15.4 SOURCES AND EVIDENCE 


As with most words, the origin of a place name cannot reliably be ascertained from 
its modern form. The same linguistic input can result in a wide variety of modern 
names: the same Old English word burh ‘stronghold’ gives rise not only to the English 
word borough and its Scots variant burgh, but also to names as diverse as Brough (East 
Yorkshire), Berry (Devon), Burgh (Cumberland), Burrough (Leicestershire), and 
Bury (Lancashire). Conversely, homonymous names can have different etymons: for 
instance, Ashton in Northamptonshire (Ascetone 1086: Gover et al. 1933: 210) appears 
to have started out as OE esc tun ‘ash tree farm, estate, or settlement, whereas Ashton 
in Devon (Aiserstone 1086: Gover et al. 1931-2: 487) appears to be the estate of a man 
called Aischere. In order to ascertain the likely etymology of a word, the historical lex- 
icographer must begin by assembling linguistic data in the shape of a variant forms 
list, i.e. an organized collection of dated historical spellings, going back to the earliest 
recorded form.* The place-name scholar’s task is essentially the same: to collect the 
recorded spellings of an individual name from documents of various dates, tracing the 
written form of the name back in time as far as possible, in order to identify the mean- 
ingful description which is its etymon, and to explain the course of its phonological 
development over time. 

Primary research on toponymy—essentially, the historical lexicography of place 
names—is typically carried out on a regional basis. Alphabetical dictionaries of the 
place names of a particular region or country, of the names of a particular class of set- 
tlement or landscape feature, or relating to a particular historical period, often treat a 
selection of names, chosen according to the author's intended readership or focus, and 
present in summary form the findings of these more detailed regional surveys, where 


3 For an account of the beginnings of English place-name studies and further details of the 
contributions of Bradley and Skeat, see Rumble (2011: 29-34). This early scholarship is set in the context 
of activity in other countries in Spittal and Field (1990). For a national survey based on practice in 
England, compare, e.g., the work on the Flemish names of Belgium carried out by the Royal Commission 
on Toponymy and Dialectology, which named the work of the EPNS as its model in its first annual report 
in 1925. 

4 Although the written form of a name is the most important source of evidence for the place-names 
scholar, there are cases where its modern pronunciation can provide information not available in the 
written record. This is especially true in Norway, for instance, where the standard written language 
is highly influenced by Danish, and in Scotland, where names of Gaelic origin may be documented 
primarily by speakers of English and interpolated and spelled accordingly. In the latter case, compare 
Fraser (1999), which draws on sound recordings made in the 1970s and is now housed in the Scottish 
Place Names Society's Sound Archives, 
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such scholarship is available, The selection and presentation of material in alphabetical 
dictionaries will be discussed in Sections 15.10-15.12. 


15.5 REGIONAL RESEARCH: THE 
UNDERLYING METHODOLOGY 


Where systematic national place-name surveys have been established, an editor or 
group of editors will usually undertake to treat the names in a given, defined geograph- 
ical area, often corresponding to a long-standing administrative unit (the historical 
counties in Britain and Ireland, and their equivalents in other countries). This is partly 
dictated by practical issues surrounding the collection of data. Many of the sources 
most likely to provide historical spellings of place names are documents relating to 
local administration, typically housed in regional repositories such as (in Britain) 
county record offices or cathedral archives, and an individual documentis likely to pre- 
serve a number of names of nearby places which it makes sense to collect en masse. 
Secondly, focusing on a particular region enables the editor to build up the body of local 
knowledge necessary to interpret its names accurately: some names can only properly 
be understood in the context of naming habits that may be peculiar to that area, or as 
they relate to the names of other places nearby: the name of Norwell, Nottingham (‘the 
northern spring’) makes more sense when related to nearby Southwell, for instance. An 
understanding of the history of the local dialect is clearly vital in identifying the words 
that make up an individual name and explaining the changes they have undergone over 
time: Chalfield in Wiltshire is first recorded in a late copy of an Anglo-Saxon charter 
as Chaldfelde, and meant ‘old field, but without knowing that in the local West Saxon 
dialect the ancestor of modern English cold would have been pronounced /tfeld/, with 
palatalized initial consonant, one would never suspect as much. EPNS county surveys 
conventionally include in their introductions a conspectus of the dialect features of the 
area in question, knowledge of which will have been a prerequisite for the accurate ety- 
mologizing of its place names, but which is likely to have been supplemented along the 
way by the author’s deductions about phonological features and processes arising from 
the study of its names. 

However, the identification ofa place name's elements is often only halfthe story. Even 
when the components of a name have been established beyond reasonable doubt, it can 
still be unclear what sense they have and how they relate to one another in a particular 
instance, or why the resulting description should have been applied to the place in ques- 
tion. So in the case above, esc tun ‘ash tree farm or settlement’ might originally have 
denoted a farm beside an impressive ash tree, a settlement in a copse of ash trees, a farm 
built of ash wood, or an estate where ash timber was grown or worked. Non-linguistic, 
historical information is required to decide the original meaning and motivation 
behind the name. The history of local administration, meeting places, infrastructure, 
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and landholding can be vital in explaining the significance of many names, not least 
those with manorial affixes (names like Sutton Courtenay or Hampton Lucy, where the 
surname of a medieval landholder is affixed to an earlier place name, to distinguish it 
from other places of the same name). Information on the landscape, geology, and even 
soil type of a locality can also enable the scholar to test a linguistically plausible ety- 
mology, or to decide between several possible explanations, and often such information 
can only be obtained by first-hand fieldwork. For instance, early forms of the name of 
Chalton, on the South Downs in Hampshire, suggest an etymology ‘chalk farm, which 
is supported by the fact that, when ploughed, the fields of the parish are strikingly white 
with chalk (Cole 1987: 47); the stream known as the Cheselbourne, in Dorset appears 
to have started life as cisel burna gravel stream, confirmed by the fact that its bed is 
composed of fine flint chippings (Cole 1999: 20). Where early spellings are ambigu- 
ous, a visit to a place named Radford might determine whether it is situated at a point 
where a river is shallow enough to cross on horseback (suggesting radeford ‘riding ford’; 
compare Radford, Oxfordshire), or whether the soil on the river bed is red in colour 
(suggesting read ford ‘red ford’; compare Radford Semele, Warwickshire). Where name- 
worthy characteristics no longer survive above ground, it is particularly striking when 
local archaeological findings confirm linguistic ones, such as the discovery of a Roman 
tessellated pavement at Fawler, Oxfordshire, first recorded in an Old English charter as 
(to) fagan floran, ‘(to) the multi-coloured floor; or the excavation of beaver bones near 
Beverley (Beferlic c.1000 OE beofor +*licc ‘beaver stream’).° 


15.6 REGIONAL SURVEYS: ORGANIZATION 
OF MATERIAL 


Factors like those discussed in the preceding section mean that first-hand place-name 
lexicography has generally been conducted region by region, and the results presented 
and organized accordingly. In England, the council of the EPNS appoints a single edi- 
tor to undertake to co-ordinate coverage of the place names of a particular county. 
The survey runs to more than 80 volumes, covering all but seven counties in full or in 
part, with work underway on all but one of these (Somerset). In line with the historical 
focus of the survey and with the organization of its source material, the counties used 
are the ‘ancient’ or ‘traditional’ counties that existed as administrative units until the 


> See for instance the project Landscapes of Governance, funded by the Leverhulme Trust and based at 
University College, London, which examines place names in the context of early medieval administrative 
units and the assembly sites associated with them. 

® For further discussion of the symbiosis between place names and archaeology, see Cameron 
(1961: 110-18) (omitted from the new edition of 1996), Gelling (1997), and (for a wider British 
perspective) the articles by Terrence James, Steffen Stummann Hansen, and Doreen Waugh in Taylor 
(1998). 
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boundary changes instituted in 1974, when the Local Government Act 1972 took effect. 
The result is a volume (or more commonly now, a series of volumes) which takes a rela- 
tively prescribed format, in which names are organized hierarchically, according to the 
administrative importance of the feature they name. Until the Second World War, sur- 
vey volumes were largely limited to ‘major names’—names of towns, villages, forests, 
districts, and larger rivers—but their scope has since expanded to cover the names of 
increasingly minor features: hamlets, fields, streams, streets, inns, and buildings. 

The county survey most recently completed is The Place-Names of Rutland (Cox 
1994), which serves well to illustrate the organization and scope of these publications. 
The editor aims to treat all the names appearing on the 1958-71 Ordnance Survey (OS) 
6” map, along with any other names of historical or etymological interest, such as those 
of deserted medieval villages. The county name is discussed first, along with the names 
of features which cover a large area of the county, like rivers and districts (e.g. ancient 
forests and vales). Place names are then treated within the divisions of the old Hundreds, 
moving roughly from north to south. Within each Hundred, the names of civil parishes 
are treated in alphabetical order, and within each parish, the parish name is followed 
by ‘names of primary historical or etymological interest’ (typically those of smaller set- 
tlements within the parish boundary), themselves arranged alphabetically. At the end 
of each parish section, all the remaining, ‘minor’ names on the 1958-71 OS maps are 
listed, with forms and etymologies as available. Street names are ordered alphabetically 
after those of the towns they belong to, followed by the names of prominent historical 
buildings and finally, inns and taverns.’ The final section covers the field names of the 
parish, divided into those first recorded after 1800 and those recorded before this date. 
Each survey contains a historical introduction to the county, its landscape and history, 
together with observations on the phonology seen in its names. Place names are located 
within the volume by means of an alphabetical index. The words found in names of the 
county and referred to in their etymologies are listed, glossed, and illustrated in a sepa- 
rate index of place-name elements. 


15.7 STRUCTURE OF A SURVEY ENTRY 


ee erie rear ene rrr err nnt nr errtrrtirerttt trrrtirrirttiiiiiiriiiiii hy 


County surveys are not organized in the alphabetical order usually expected from a dic- 
tionary, and yet the information they present is recognizably the stuff of historical lexi- 
cography, presented in a format similar to that of a historical dictionary like the OED. 
Take an example entry from the EPNS Rutland volume: 


ALSTHORPE (lost) 
Alestanestorp 1086 DB, Alestanthorp 1282 IpmR, Alstanthorp Edw 1 Ipm 


7 See Smith (1954). 
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Alestorp(e) 1232 RHug, 1234 Pat, -thorp(e) 1300 Ipm et freq to 1319, 1334 Pat, 1350 Ipm 
Alstorp 1202 Ass, -thorp(e) 1297 Ass, 1324 FF et passim to 1428 FA 
Austhorpe 1610 Speed, - trop 1548 Pat, Awstroppe alias Alstroppe 1677 Deed 


‘(E)alhstan’s outlying farmstead, v. orp. The DB form shows a glide-vowel e which 
has replaced medialh. For the OE pers.n., v. Feilitzen 152-3. 


The entry begins with a headword, usually the current form of the name, as listed in 
current OS maps and gazetteers. Here, the name is that of a deserted village, so the head- 
word is followed by a note on currency (or obsoleteness), indicating that the name is 
‘lost’ or no longer in use. Where a name is current and the modern pronunciation is 
unpredictable from the modern form or otherwise linguistically interesting, that will 
also be noted (as it is e.g. for Glaston [gleisten]). There then follows a list of dated variant 
forms, each with an abbreviated reference to the source in which it appears, including 
the earliest recorded form and a selection of other forms, grouped into four types to 
illustrate the development of the form of the name over time. The name is first attested 
in Domesday Book in a spelling which shows the genitive of the Old English name 
Ealhstan (which in this area had the Anglian form Alhstdn), and subsequent forms in 
this group (where the genitive -es is lost) similarly shows trisyllabic forms with a medial 
syllable -stan-; the second, dating from the early thirteenth century, shows reduction of 
the medial syllable in the first group; the third, also attested from the thirteenth century, 
represents disyllabic forms in which the reduced medial syllable of the second group is 
lost altogether; and the last set, dating from the early modern period, shows vocaliza- 
tion of -/- in the first syllable of the previous group. An original meaning for the name is 
then posited, and an account given of the name's constituent elements: an attested Old 
English personal name (for which a reference in a specialist dictionary of forenames 
is given), and the element borp Thorpe, which acts as a cross-reference to the index of 
name elements at the end of the volume. 


15.8 PLACE-NAME ELEMENTS: CONVENTIONS 


The uncertain date of coining of most place names poses considerable challenges for the 
methodology and conventions of any place-name dictionary dealing with the toponymy 
of a territory with a very long history of continuous occupation, such as England. As 
seen in the examples discussed above, when words which have been inherited into or 
formed within English come to make up place names, they are conventionally referred 
to in the form they took in the earliest stage of the language, Old English, regardless 
of both the date at which the name is first recorded and the date at which it is likely to 
have been coined. This practice reflects the fact that most major settlement names in 
England are known to have originated in the Anglo-Saxon period (since, like Alsthorpe 
and the Ashtons discussed above, they are first recorded in Domesday Book (1086) or 
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before), and that a significant proportion of the vocabulary preserved in place names 
has since become obsolete. Especially at early dates, a name may have been in use for 
many years before it is first recorded in a source that happens to survive until the present 
day, and this convention avoids decisions about the date at which the name was coined. 
However, it can lead to the potentially confusing situation where a name likely to have 
been given in the nineteenth century can be etymologized from elements labelled as Old 
English, purely by virtue of the fact that the words that make it up happen to go back 
to the Anglo-Saxon period. For instance, there is no evidence that the name of Wild 
Goose Meadow (Dodgson 1971: 100) existed before it is first recorded in the tithe award 
of 1842-43, yet it is listed in the elements index under OE wilde-gos . 

The same historicizing principle is applied when determining the headword form of 
place-name elements of non-English linguistic origin. Over the centuries, several lan- 
guages other than English have been spoken in England for considerable periods of time 
by significant populations. Each of these populations has used the ordinary words of its 
own language to give meaningful descriptions or denominations to settlements, struc- 
tures, and landscape features in its mother tongue, some of which have come to function 
as place names. By the same token, a different (though overlapping) set of words from 
each of these languages was borrowed into English, absorbed more or less seamlessly 
into the everyday vocabulary of native speakers: so some Celtic and Latin words used in 
Romano-Britain were adopted into the language of the Anglo-Saxons and naturalized 
into Old English; vocabulary both from Scandinavian languages spoken by immigrants 
from Denmark and Norway to the north and east of England, and from the French of the 
Norman overlords in the years after 1066 became part of the ordinary lexicon of Middle 
English, and available to its speakers for all kinds of purposes, including the description 
and labelling of geographical places. Because of the gap which must often exist between 
the original description that lies behind a place name and its earliest extant record, it 
can be difficult to know, for instance, whether a name first recorded in the thirteenth 
century was given in the Norman French spoken by followers of William the Conqueror 
in the late eleventh century, or coined later by native English speakers using ordinary 
Middle English words which happen to have French etymologies, even though these 
speakers may have been quite oblivious of their foreign origins. 

To overcome the difficulty of specifying the point along this historical and linguis- 
tic spectrum at which an individual name began, place-name elements of non-English 
origin are conventionally attributed to the language in which they were first used in 
England (whether or not there is evidence to suggest they were subsequently borrowed 
into English more generally). This has a number of implications. A number of river 
names, for instance, derive from words thought to originate in a Celtic language spoken 
in England before the Anglo-Saxon invasions for which written records do not exist; 
such words must be reconstructed on the basis of what is known about them from later, 


8 For similar reasons, Padel (1985) gives his headwords in Middle Cornish, since most important 
Cornish place-name forms date from this period, which is also that of the most extensive remains of the 
language, minimizing the number of words that need to be reconstructed. 
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recorded stages of the languages in question and from accepted rules of sound change, 
and distinguished from attested words by the addition ofan asterisk. 

Many names in the north and east of England first recorded in the late Middle English 
and modern English periods are composed of words of Scandinavian origin, and these 
constituent words will conventionally be presented as ‘Old Norse’ (or occasionally more 
specifically Old Danish, Old West Norse, etc.), and the citation forms drawn from the 
variety recorded earliest, literary Old Icelandic, despite the fact that they may have 
become part of the common or regional vocabulary of the English of the area at the 
date when the name was coined, and are at least as likely to have been created by English 
speakers of the area as they are by Scandinavian immigrants speaking a north Germanic 
language. A large number of the names etymologized from ‘ODan banke’ or from ‘OFr 
place’ will originate as descriptions in Middle English, using the English words bank 
and place, not in fact in Danish or French; and indeed these elements are still available 
today for developers to apply to streets in new housing estates in any country where 
English is spoken. Some recent county surveys (such as that for Leicestershire) and 
national dictionaries, such as (Cambridge Dictionary of English Place-Names (CDEPN) 
2004) have attempted to avoid this problem by giving etymologies in Middle English or 
modern English when the evidence and balance of probability suggests a date of coinage 
within either of those periods, followed by the Old English form in brackets. 


15.9 DICTIONARIES OF PLACE-NAME 
ELEMENTS 


Alongside those which take individual names as their headwords, dictionaries also 
exist which take place-name elements as their object of study, Again, the raw material 
for these dictionaries will typically be the data as it has been processed in the detailed 
regional surveys discussed in previous sections. Collecting a corpus of those names 
across the country which are considered to contain an individual word, and identifying 
patterns in, for instance, the kinds of site or feature to which it is commonly applied, the 
elements with which it tends to combine, and the geographical distribution of names 
which contain it, can yield important information about that word’s sense, grammar, 
phonology, and etymology. These elements dictionaries are not dictionaries of names 
but of words, originating in various languages, as they have been used in place names; 
that is, historical dictionaries based on onomastic rather than literary evidence. The 
most recent works of this kind for English place names are at pains to stress this link 
with other historical dictionaries by ending each entry with a cross-reference to the 
corresponding entry in, for instance, the Dictionary of Old English, the Anglo-Norman 
Dictionary, the Middle English Dictionary, and the OED, 

Several generations of elements dictionary now exist for English place names, and 
the presentation of information tends to follow a relatively set formula. Typically, the 
headword (following the conventions given above) is followed by grammatical information 
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about the word (part of speech and, where relevant, grammatical gender), and a brief 
definition of its core sense. Brevity is appropriate here since, as discussed in Section 15.5, 
ascertaining which of the many senses a word has in an individual place name can be more 
problematic than when dealing with a text, since a word in a place name comes supplied 
with far less linguistic context: often only one other word, and sometimes not even that. 
On the other hand, the referential function of place names can make it possible to identify 
a word in a place name with the particular feature it denotes in the landscape. Elements 
dictionaries collect together names from throughout the country which contain the same 
lexical item and identify patterns in the kinds of site they apply to, which can reveal seman- 
tic information which is not available in other kinds of source, and hence not recorded in 
other kinds of dictionary. This is especially true when linguistic findings are combined with 
fieldwork. Pioneers of this approach in England are Margaret Gelling and Ann Cole, whose 
study of words for landscape terms in place names has yielded the information, for instance, 
that in areas where the two terms are both used, OE brdc (the ancestor of the modern word 
brook) typically refers to a sluggish stream with a muddy bed, whereas a clear, fast-flowing 
stream is more likely to be described as a burna (modern burn) (Cole 1990-1: 37; Gelling 
and Cole 2000: 7). Place names are also eminently locatable, and hence mappable, so that 
elements dictionaries are able to present and discuss information about the regional distri- 
bution of words, and the ways in which their form may vary over space as well as time. 


15.10 ALPHABETICAL 
DICTIONARIES: SELECTING A CORPUS 


Dictionaries of place names, and place-name studies more generally, typically aim to treat 
the names of places located with a particular, well-defined, geographical area. The Concise 
Oxford Dictionary of English Place Names, for instance, aims to treat the names of places 
in England, not place names which can be considered part of the English language, or the 
names of places given in English. This terminological distinction is important: on the one 
hand, there are a very large number of names given in English outside England (in Canada, 
New Zealand, and the United States, for instance), and on the other, names of many towns 
and villages in England were originally coined in Janguages other than English: Latin, 
French, Celtic, or early Scandinavian. 

When it comes to arriving at an inventory of names to be treated in a given area, many 
editors turn to information assembled for other purposes, such as mapping, adminis- 
tration, travel, or tourism. For many countries, state bodies exist to record, approve, and 
standardize the nation’s toponymy, often providing official lists of authorized names in the 
form of national gazetteers and, increasingly, databases, from which onomasts can select 
their dictionary’s headforms.’ So, for instance, Rayburn’s Dictionary of Canadian Place 


® The United Nations Group of Experts on Geographical Names advises on the standardization of 
names and their spellings at the national level and their representation in various linguistic contexts and 
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Names (1999) treats 6,225 names drawn from the collection of some 450,000 official names 
listed in the Canadian Geographical Names Database, maintained by Natural Resources 
Canada in Ottawa for the Geographical Names Board. 

In Great Britain and Ireland, this role is effectively served by the Ordnance Survey, 
and dictionaries of names in various parts of the British Isles take as their starting 
point the names recorded in a particular current map series, atlas, or gazetteer, selected 
according to the level of detail the work is intended to achieve and the specific needs 
of its target audience. The EPNS survey volumes discussed in Sections 15.5~15.7 are 
intended to provide detailed, definitive, and systematic coverage of the names of a given 
county, primarily for an academic readership, and the society's guide to the preparation 
of such volumes (Smith 1954) instructs editors to begin by collecting all the names on 
the 6” Ordnance Survey maps of England, which record features as small as field bound- 
aries and individual buildings. One-volume dictionaries like Mills’s Dictionary of British 
Place-Names (DBP) and the CDEPN draw their headword lists from popular touring 
atlases like the Ordnance Survey Road Atlas of Great Britain (1983 ff). In recent years, 
the EPNS popular series has been attempting to disseminate the findings of the survey 
volumes beyond academia; the first of these, Whaley’s Dictionary of Lake District Place- 
Names (2006), is conceived as a handy guide to names in the National Park for interested 
visitors to the area, and consists of a single volume, including all names listed in the 1994 
OS Landranger (1:50,000) maps of the Lake District that it is intended to accompany in 
the hiker’s rucksack, in stark contrast to the five EPNS volumes which cover the same 
area. Owen and Morgan's Dictionary of the Place-Names of Wales (2007) similarly draws 
on names appearing on the OS Travelmaster (1:250,000) map of Wales and the West 
Midlands, with the aim of providing coverage of the country in a single volume, and 
the classic work on Irish toponymy, Joyce's Irish Names of Places (1869-1913) takes its 
headword list and much of its data from the Ordnance Survey Namebooks, assembled 
during the surveying of Ireland in the 1820s and 1830s in preparation for the issue of the 
6” maps of the country. 


15.11 INCLUSION POLICY 


Once a corpus of names has been chosen, many dictionaries will still find it necessary 
to treat only a selection of names from the pool, and a number of principles are gen- 
erally used. There is a core of ‘major names’ which is routinely covered by dictionar- 
ies. This tends to include first and foremost the names of settlements (cities, towns, and 
villages), with more populous settlements generally deemed more worthy of inclusion 
than smaller ones. The names of the major administrative divisions ofa country are also 


scripts. A list of the national names authorities in various countries is given on their website: see <http:// 
unstats.un.org/unsd/geoinfo/UNGEGN/countrylinks.html>. 


likely to figure as entries (such as counties and hundreds in England and Scotland), and 
the names of settlements with significant administrative status (county towns and par- 
ishes in England and Scotland, townships in Ireland) are generally privileged over those 
without. 

The names of natural geographical features —rivers, lakes, mountains, hills, capes, 
forests, bays, and so on—tend to have less coverage in general place name dictionar- 
ies than those of human habitations but again, the larger the feature, the more likely 
it is to be included in a given dictionary.” More specialised reference works devoted 
either to the naming of the natural Jandscape in general or to the names and nam- 
ing of particular classes of feature, both alphabetically arranged and otherwise, also 
exist (compare Gelling 1984; Gelling and Cole 2000; Ekwall 1968). So-called ‘minor 
names, or microtoponyms (names of estates, farms, fields, streets, and individual 
buildings) are often included only in the most detailed onomastic surveys, although 
in recent years, as the coverage of major names has progressed, exhaustive, small- 
scale studies of the toponymy of a given area have become increasingly common in 
several countries, and again, alphabetical dictionaries of individual classes of minor 
name also exist (such as Field’s English Field-Names: A Dictionary (1972) or Cox’s 
English Inn and Tavern Names (1994)). 

Historically, names which are demonstrably older have also been given priority in 
the more scholarly place-name dictionaries over those which are only recorded more 
recently. Ekwall’s Concise Oxford Dictionary of English Place-Names (1960) took as its 
corpus the names of places in England listed in Bartholomew’s one-volume Survey 
Gazetteer of the British Isles (1904 ff), a canonical work at the time, but omitted some 
major names listed by Bartholomew, either because they were ‘of late origin’ or because 
they were only recorded after 1500, but added some names of farms and estates he con- 
sidered ‘old and etymologically interesting’ (Ekwall 1960: ix). This bias has been justi- 
fied by the argument that, on the one hand, names coined in the modern period are 
likely to be etymologically transparent or ‘self-explanatory, and on the other, that some 
late-recorded names are likely to be older names for which early evidence happens not 
to survive, and in the absence of such evidence, it is unwise to speculate about their ety- 
mology. More recent dictionaries of English place names such as CDEPN have tried to 
redress the balance by aiming to cover all of the names in their chosen atlas or gazetteer, 
regardless of their antiquity. In addition, dictionaries of modern place names (Room's 
A Concise Dictionary of Modern Place-Names in Great Britain and Ireland (1983)) have 
been written with the express purpose of filling the resulting gap, including only place 
names first documented after 1500. 


10 For example, the county volumes of Danish place names (Danmarks Stednavne) include primarily 
habitation names, with names of natural features covered only in selected volumes. The size of natural 
feature or settlement as an inclusion criterion is defined very precisely, e.g., in Rayburn (1997), which 
includes, for instance, all incorporated settlements with populations of 150 or more, all lakes and islands 
with an area of 350 km square, and all rivers with lengths of 75 km or more (1997: vii). 
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15.12 PRESENTATION AND STRUCTURE 


An entry in an alphabetical place-names dictionary will typically have the name of an 
individual place as its headword, usually the current form of the name, as given in the 
gazetteer or map which sets its corpus, together with some indication of where the place 
in question is located (generally, in Britain, the name of the county in which it is located, 
and sometimes also a grid reference). The date at which the name is first recorded, and 
usually the form it had at this date, will then be supplied, sometimes supplemented by a 
few other forms illustrative of the name’s development over time. Finally, the elements 
which make up the name will be given, accompanied by a brief discussion explaining or 
supporting that etymology where this is deemed necessary. 

Broadly speaking, the division of place names between entries is decided according 
to historical principles, though only insofar as this would not impede the reader with 
no prior knowledge about the origin of a name from finding it in the dictionary. So in 
DBP for instance, Ashington in Northumberland (OE @scen + denu, valley where ash- 
trees grow) and Ashington in West Sussex (analysed as ‘farmstead of the family or fol- 
lowers of a man named Aisc’ and etymologized from an OE patronymic term + tin) 
each has its own entry, one following the next, reflecting the distinct origins of the two 
names. In cases where a significant number of names with the same modern form and 
the same underlying etymology exist, however, they will often be treated together in 
the same entry for ease of reference. So (returning to the examples discussed in Section 
15.4), in DBP the entry for Ashton begins ‘a common name, usually “farmstead where 
ash-trees grow”, OE esc + tin’, and Ashton in Northamptonshire appears in the para- 
graph of examples which follow, together with nine other places called Ashton in vari- 
ous parts of England with the same origin, many of which are now distinguished from 
one another by the addition of an affix. Importantly, within that list, each name is dated 
and exemplified individually, reflecting the fact that each can be seen as representing 
an individual naming event, and even names composed of the same elements will have 
been independently given at different times and in different circumstances. The few 
homonymous names with other origins (such as Ashton in Devon, 4schere’s farmstead) 
are then discussed and exemplified within the same entry in a following paragraph. 
Where the same etymological input results in divergent modern forms, these are treated 
under separate headwords. Although Aston-on-Clun, Shropshire happens, like Ashton, 
Northamptonshire and most of the other places with that name, to derive from OE esc + 
tiin, it is not treated in the same entry as them, but in the entry for Aston, on the grounds 
that a reader consulting the dictionary could not be expected to identify it as a variant of 
Ashton (especially since most places called Aston have quite a different origin, meaning 
‘east farmstead’). 

In dictionaries which focus on recently coined names, such as the English names of 
countries in the New World, the etymological focus will necessarily be less on the iden- 
tification of elements and more on the motivation for naming. Many of Australia’s place 
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names of English linguistic origin, for instance, were given by explorers, colonists, pros- 
pectors, and land surveyors as recently as the late eighteenth and nineteenth centuries 
and remain morphologically transparent to English speakers today; yet dictionaries are 
still required to explain that Botany Bay, for instance, represents a re-naming by James 
Cook of the cove he originally called Stringrays Harbour, in recognition of the abun- 
dance of plant specimens collected there by botanists Joseph Banks and Daniel Solander 
in Spring 1770, or that Ayers Rock (or Uluru) was so named on 19 July 1873 by the sur- 
veyor William Gosse in honour of the then Chief Secretary of South Australia, Sir Henry 
Ayers. Such dictionaries tend to take the form of alphabetical lists of more discursive 
‘stories behind the name. 


15.13 SOME OTHER DICTIONARY TYPES 
IN BRIEF, AND NEW POSSIBILITIES 
PRESENTED BY ONLINE PUBLICATION 


Alongside the more scholarly dictionaries of place names discussed above, a number of 
other kinds of source exist, and several deserve mention. Some dictionaries cast their 
geographical net wider than an individual country to cover “World Place Names. These 
texts are typically more derivative than their more narrowly focused counterparts, 
drawing the information they present from various published works but giving only a 
vague indication of the main sources relied upon in a select bibliography or brief intro- 
duction. Editors of texts of this kind (which are necessarily extremely selective in terms 
of which of the world’s place names they include) do not usually set out their selection 
criteria, although factors such as the size of settlement or feature named and the avail- 
ability of relevant or interesting information seem often to playa part. 

At the other end of the geographical scale, scores of amateur dictionaries, each deal- 
ing with the names of a particular locality and aimed at the tourist market, are also 
available in local libraries and gift shops. Online publication offers some exciting new 
opportunities in this discipline. For instance, an interesting recent development with 
a sociolinguistic dimension is the Location Lingo project (<http://www.englishpro- 
ject.org/activities/location-lingo>), a crowdsourcing enterprise established as part of 
the English Project at the University of Winchester for the collection of contemporary 
nicknames for places, which promises to provide interesting information not presented 
on maps or in gazetteers, especially for such modern entities as housing estates, parks, 
buildings, and neighbourhoods. In the mainstream of historical name studies mean- 
while, technological advances now look set to foreground the status of this research as 
a branch of historical lexicography, a status that has partly been obscured by fact that its 
output tends to be organized regionally rather than alphabetically. The Digital Exposure 
of English Place-Names (DEEP) project, funded by JISC and hosted by the University 
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of Nottingham, is working to bring the contents of the individual EPNS county survey 
volumes together into a single, fully searchable dataset. This will enable users to inter- 
rogate the philological data of the entire country’s place names in a variety of ways, 
much as searches can be conducted in online dictionaries like OED Online, with links 
to Ordnance Survey digital mapping applications enabling the reader to map particu- 
lar name types, forms, or elements across the country. The first product of this work, 
the completed Historical Gazetteer of Englands Place-Names (<http://placenames.org. 
uk/>), shows some of the promise of this approach. 


CHAPTER 16 
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16.1 SOME GENERAL OBSERVATIONS 


ANTHROPONYMS lie at the fringes of the lexis, where denotation can disappear alto- 
gether and meaningfulness exist only in social connotations or individual personal 
associations. It is the social and personal function of these names and in most cases the 
antiquity of their origin, that make them both fascinating to the public and exceedingly 
difficult for lexicographers. A fundamental problem is an inadequate supply of relia- 
ble research, hardly surprising in view of the need for researchers to be familiar with 
a great range of onomastic data (often difficult of access), to have specialized etymo- 
logical skills, and to have some knowledge of the personal and social histories of the 
name-bearers. ‘This chapter will explore these themes through a survey of some of the 
main dictionaries of personal names and surnames in the English-speaking world. 
Some incorporate fine, original scholarship, but even these may fall short in coverage, or 
in explanatory accuracy, or more often in both. It is rare for an author of any of the more 
general anthroponymic dictionaries to be a historical linguist, experienced in onomas- 
tic research, and a competent historian as well, whether in social history (including his- 
torical demography), in local and family history, or in personal history (biography and 
prosopography). In most dictionaries aimed at the popular market, the compiler is nei- 
ther an etymologist nor a historian, and ill-informed plagiarism and fanciful guesswork 
can have free rein. The buying public have no way of telling fiction from fact, and pub- 
lishers’ claims for authoritativeness are an unsafe guide to the relative accuracy of the 
contents. 

The four main types of anthroponym (given names, surnames, nicknames, and pseu- 
donyms) are unevenly represented in dictionaries. Bookshops and catalogues of online 
booksellers are awash with a wide choice of dictionaries of given names. (In the UK 
market they were once known as christian names but are nowadays more usually called 


272 PETER MCCLURE 


first names or baby names.) The constant turnover in parents seeking guidance in choos- 

ing baby names provides a ready market for new editions, some of which are annually 
updated to reflect rapidly shifting naming fashions, In contrast, there are not many dic- 
tionaries of surnames (or family names) and there is little competition among them. The 
market for them (chiefly family and one-name historians) is smaller and relatively static 
compared to that for given names, and the stock of surnames for which explanations 
can be offered (less than half the total stock of current English surnames) has altered 
little over the last few decades, though this situation is about to change. There is an even 
smaller handful of dictionaries of nicknames and pseudonyms. 


16.2 GIVEN NAME DICTIONARIES 


Preereeereeeereereeteeretrrrerirretirri ti rrrreriee erie erireci errr rier ereieri ei rierrrerier errire tit tir er terciti le SO ee 


Few dictionaries of given names are substantially based on the editors’ own researches, 
a fact that tends to affect the quality of the lexicographic content. The following are, in 
this respect, standard works from which other dictionaries have partly or wholly drawn 
their inventories and explanations: Withycombe's Oxford Dictionary of English Christian 
Names (ODECN, 1945, 1977); Dunkling’s Scottish Christian Names (1978); Dunkling’s 
and Gosling’s Everyman's Dictionary of First Names (EDEN, 1983, 1993); O Corrain’s and 
Maguire’s Gaelic Personal Names (1981); Gandhi's Penguin Book of Hindu Names (1992); 
and Ahmed’s Dictionary of Muslim Names (1999). 


16.2.1 Size and Range of Inventories 


The major change in general given name dictionaries during the last thirty years is the 
expanding numbers of names listed. The third edition (1977) of Withycombes ODECN 
deals with around 1,100 names, only marginally relenting on her original policy of 
excluding ‘the flood of newly invented names in use at the present time in the United 
States of America (preface to the first edition, p. vii). Comparable dictionaries pro- 
duced for the UK market now run into thousands of names. Cresswell’s Dictionary of 
First Names for Cassell (CDFN, 2009) has over 10,000, and is the best of the popular 
UK-focused dictionaries. Aimed at the American market, Stafford’s 60,001+ Best Baby 
Names (STBBN, 2008) speaks for itself and is in marked contrast to Stewart's American 
Given Names (1979), which contains about 900 names. 

This remarkable expansion has two main causes. The first is the growth in informa- 
tion about what names are in use and with what frequency. Extensive lists of registered 
birth-names, with their frequencies and rankings, are now provided on the internet by 
government organizations. Fashions in naming shift with much greater rapidity than 


! A historical survey of given name dictionaries can be found in Hanks (2009: 139-45). 
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in Withycombe'’s day, and the annual posting of new baby-name statistics on the official 
websites enables editors to revise their dictionaries each year, if they choose, and to pro- 
vide tables showing current and past rankings of the most popular names. The number 
of names actually in use (including variant spellings) is too large for every name to be 
included in a print dictionary. For example, the Office for National Statistics (ONS) web- 
site, Baby Names in England and Wales 2010, records the registration of 723,165 live births 
in that year, with 27,700 different boys’ names and 34,700 different girls’ names. A great 
many names are one-ofts and understandably none of the dictionaries (including those 
that are online) attempts to list even half of those registered. It is a pity that the ONS web- 
site itemizes only the top hundred names in any year, which does not provide dictionaries 
with the statistical basis their editors might wish for in making rational and systematic 
choices. Editorial policies on inclusion and exclusion are at best vaguely alluded to and are 
more often unstated. The selection of names is presumably based on those found in previ- 
ous dictionaries, modified if at all by the most recent ONS list and by personal preferences. 

The second reason for larger inventories is the huge increase since World War II in 
the size and diversity of the name stock, not only in Britain but in all countries where 
English is the first language. It has been driven first of all by the creation of new names, 
especially but not exclusively in America, secondly by the nationalistic revival of Celtic 
names in Ireland, Wales, Cornwall, and Scotland, and thirdly by the introduction of 
traditional names of other cultures through immigration, such as Spanish names in 
America, and Muslim and Hindu names in the UK. The revival of Celtic names and the 
introduction of immigrant names have generated their own dictionaries, which have 
become quite numerous since the 1990s. 

Compilers of general dictionaries of given names are rarely competent etymologists 
in their own language, and never in the many others from which given names have 
derived. One solution is to draw material from specialist dictionaries that deal with a 
linguistically- restricted name stock. Another is to do what Hanks and Hodges have done 
in their Dictionary of First Names for Oxford University Press (ODFN, 1990) and solicit 
contributions from experts in languages other than English, complementing the use of 
specialist dictionaries. This results in a broader linguistic range, embracing not only tra- 
ditional and more recent English names but also common names in the major European 
languages, in Arabic, and in the Sanskrit-derived languages of India. The focus of ODFN 
is wider than English or American usage, and its comparative and collaborative meth- 
odology is unusual. The 2001 edition of its offshoot, A Concise Dictionary of First Names, 
adds Chinese and Japanese names. 


16.2.2, Online Dictionaries 


Dictionaries with a global focus are increasingly common. Some are in print but most are 
online, where size of inventory is theoretically not a problem. Online dictionaries have the 
additional advantages of being able to increase their inventories at will and to offer alterna- 
tive search options. A rare scholarly example is Burch’s Irish Names from Ancient to Modern, 
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which offers a large repertory of etymologized names as well as subsidiary lists, such as 
Irish names sorted by meaning, English names translated into Irish, and traditional Irish 
first names. However, databases aspiring to a global, multilingual coverage, such as Behind 
the Name: the etymology and history of first names (BTN) and Search 30,000 Baby Name 
Meanings, tend to be amateurish and unreliable, and their inventories for each language or 
culture are more limited than those in many print dictionaries, even for UK names. 


16.2.3 Dictionary Structure 


Names usually appear in a single alphabetical order, with the gender of each name 
labelled as required, but separate lists for boys’ and girls’ names are sometimes preferred. 
Occasionally, in order to help the parent make a fitting choice according to religious prac- 
tice or social preferences, name lists are grouped conceptually or semantically, in the style 
of a thesaurus. In the Dictionary of Islamic Names by Hareeri and Hafiz (2009), each major 
concept is given its own chapter of names, listed partly but not consistently in alphabeti- 
cal order. In UK and American baby-name dictionaries names are sorted into a variety 
of supposed linguistic, cultural, or connotational categories to help parents choose the 
‘right’ name for their child. Print dictionaries like STBBN include a selection of options 
in their introductory matters. A typical online dictionary is Baby Names and Meanings, 
whose inventory of more than 18,000 names can be searched by ‘meaning’ (Angel, Beauty, 
Biblical, ...), ‘origin’ (African, American, Albanian, Anglo-Saxon, Arabic, ...), or ‘theme 
(cool, classic, celebrity, . . .), but the linguistic information is unreliable. 


16.2.4 Spelling and Pronunciation 


Most dictionaries provide spelling variants and list common pet forms or diminutives. 
Traditional names usually have a fixed spelling but more recent coinages can be spelled 
in many ways, often at the whim of the name-giving parents, so dictionaries vary a good 
deal in their choice of both the head form and the spellings to list as variants. 

Information about pronunciation is largely confined to those dealing with a single 
non-English name stock (Gaelic, Arabic, or Hindi, for example). The absence of any 
pronunciation guide in ODFN is a significant omission, given the inclusion of so many 
names foreign to native English speakers. Nor should editors assume knowledge of the 
pronunciation of English names; few dictionaries note the two alternative British pro- 
nunciations of Caroline, for example. 


16.2.5 Name Usage 


Many dictionaries comment on the history of name usage and the customs by which 
names are chosen, either in an introductory essay or within individual entries, or by 
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providing comparative tables of name popularity at different dates. Of the general and 
comparative dictionaries, ODFN offers particularly informative historical and linguis- 
tic surveys, both for European and non-European naming traditions. Nevertheless, the 
quality of historical information in this and other dictionaries can be extremely variable, 
sometimes generalizing with accuracy and precision of detail, but at other times silent, 
hazy, or misleading about the chronology of individual name usage or of fashions in 
types of name. 

Editorial awareness of scholarly research in this field tends to be poor. There is 
much reliance on the historical summaries in Withycombe's ODECN, and on those 
in Dunkling and Gosling’s EDFN (1983), which was the first to base statements about 
post-medieval name usage on a broad sampling of English Parish Registers (1600 
onwards) and of nineteenth- and twentieth-century official birth registers, both British 
and American. Although major new research has since been published, notably by 
Smith-Bannister (1997) for the sixteenth and seventeenth centuries and by Redmonds 
(2004) for the late medieval and post-medieval periods, ODFN is the only dictionary to 
have taken note of any of it. 

Pickering’s Penguin Dictionary of First Names (PDFN) is one of several that mention 
famous people who have borne a particular name, whether or not that fact has had any 
impact on the name's subsequent popularity. In dictionaries of Muslim names such 
information is regularly provided, since choice of name is strongly motivated by its prior 
use by the heroes and heroines of Islam. 


16.2.6 Origins, Etymologies, and Meaning 


Few editors have the relevant linguistic and onomastic expertise to distinguish deno- 
tation from connotation, or etymology from onomastic function, or to judge which 
etymologies are reliable for names borrowed from exotic or archaic languages. The 
problem is particularly acute for compilers of general dictionaries who, unless they con- 
sult expert etymologists, have little option but to take on trust the explanations they find 
in other dictionaries, resulting in endless repetitions of whatever errors appeared in the 
original. For etymologies of traditional English names, current dictionaries repeat those 
in Withycombe’s ODECN, in spite of the fact that later scholarship has revised many of 
them, as in the following example: 


Aldous (m.): Aldus, Aldis, or Aldous has been used as a christian name in England 
since at least the 13th C. It seems to have been almost confined to East Anglia, where it 
is still occasionally used, and where the surnames Aldous, Aldis(s), Aldhouse, &c., are 
common. Its earlier history is obscure, but there can be little doubt that it is the same 
as the Old German Aldo ‘old; which corresponds to the Old English Ealda. Aldo took 
firm root in Italy and was often latinized as Aldus (cf. Aldus Manutius the famous 
printer of ‘aldine’ editions of the classics). There were two saints of the name, Aldo 
or Aldus, an 8th-C hermit of Bobbio whose relics are preserved at Pavia, and Aldo 
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Count of Ostrevant, an 8th-C Benedictine of the monastery of Hirson in Belgium. 
The writer Aldous Huxley (b. 1894) is the best-known recent example of the name. 
ALDUS HR 1273. 


Withycombe’s citation of a medieval form of the name (dated 1273 from the Hundred 
Rolls) follows the practice of historical dictionaries in citing early name-forms in a sepa- 
rate text block, and is not imitated in other modern dictionaries, but while it proves that 
the name existed in the late thirteenth century it does not justify her explanation, every 
detail of which is tangential, irrelevant, or erroneous. English Aldus is not a latinized 
form ofa male name buta vernacular pet form ofa female name beginning in Ald- (such 
as Middle English Aldith, Old English Ealdgyd) using the Middle English hypocoristic 
suffix -us. It is this name, not a latinized version of Ealda, that gave rise to the surnames 
Aldous, Aldiss, and Aldhouse.* Aldus was not confined to East Anglia but widely used in 
the north-east midlands and Yorkshire into the fourteenth century, when it probably 
died out (see Smith 1961-3, V: 45). Huxley’s forename is probably a transferred use of 
the surname. This information has been in the public domain for the last fifty years, yet 
Withycombe’s explanation continues to appear in all reputable dictionaries that include 
the name. 

Some of the dithematic names of the Germanic languages have become espe- 
cially popular in the English-speaking world, but modern dictionaries rarely explain 
them accurately. Partly this arises from Withycombe’s reference to (sometimes hypo- 
thetica]) Old English etymons for names that have cognates in Old Scandinavian or 
Continental Germanic. English Ralph, Reynald (Reginald), Richard, Robert, and Roger 
are al] Norman French forms of Continental Germanic names, as Withycombe rec- 
ognized, but in many recent dictionaries they are mistakenly given the Old English 
or Old Scandinavian etymons that Withycombe had tentatively cited only as possible 
alternatives or antecedents. Another problem arises from the mistaken wish to explain 
Germanic dithematic names as meaningful phrases. As Stenton remarked many years 
ago ‘the men who coined the names Fribuwulf “peace-wolf” and Wigfrid “war-peace” 
were not concerned about their meaning’ (Mawer and Stenton 1924: 168). Withycombe 
was careful to gloss each element of such compounds separately. She explains Roger, for 
example, as a compound of Germanic hrothi ‘fame’ and ger ‘spear, but many later dic- 
tionaries, such as Rooms’ Dictionary of First Names and PDFN, attempt to make sense of 
it as ‘famous warrior, while BTN opts for ‘famous spear’ and Turner's Baby Names 2012 
(BN2022) for ‘spear man. Transferred surnames are a fairly common source of modern 
given names, as illustrated by Aldous, but PDFN’s interpretation of Hilton as ‘English 
first name meaning “from the hill farm” ’’ exemplifies a not-uncommon confusion of the 
etymology of the place name, the sense of the derived surname, and the meaning of the 
given name. 

Explaining recently invented names, especially for girls, is a problem for all editors. 
Many are blends of phonaesthetic syllables borrowed from other names (certainly Jolene 


2 See DBS and DES, s.n. Aldous. 
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and probably Jaden, for example, as Cresswell suggests in CDFN), while others have sev- 
eral possible derivations, none of which can be proved. Kayleigh (in various spellings) 
is explained in different dictionaries either as a transfer from an Irish surname (either 
Keally or Kelly), an altered form of another given name (such as Kylie and Callie), a com- 
pound of Kay and Leigh, or an extended form of Kay with the female name-sufhx -lee. 
BN2012 provides the fanciful gloss ‘American, meaning “pure” ’ and STBBN gives ‘(Irish) 
form of Kaylee: open. 

In the inventiveness of their explanations, these last two dictionaries are typi- 
cal of many baby name dictionaries, especially online dictionaries that belong to 
mother-and-baby care websites like <www.babyhold.com> and <www.gurgle. 
com>. In this child-centred world, what most parents want are not etymologies but 
auspicious meanings, no matter how inauthentic. In STBBN Stafford’s erroneous 
explanation of Richard as ‘(English) wealthy leader’ avoids the scholarly etymology 
given in ODFN (which she lists in her bibliography) in favour of the sort of word 
association that she uses to gloss the American names Moon as ‘dreamy and Hilton 
as ‘sophisticated’. 

Even in the more reputable dictionaries, unfamiliarity with the full range of ono- 
mastic scholarship is widespread and is usually accompanied by an unwillingness to 
acknowledge wholesale or even partial indebtedness to previous given name dictionar- 
ies and by publishers’ disingenuous claims that their dictionaries are ‘authoritative’ or 
‘definitive? ODFN is a rare example of the good practice of providing a full bibliography 
and acknowledgement of expert advice. 


16.3 SCHOLARLY DICTIONARIES 
AND GLOSSARIES OF MEDIEVAL GIVEN 
NAMES AND BYNAMES 


There is an important body of dictionary-style monographs of medieval given names 
and bynames (second names which may or may not function as hereditary surnames). 
Many are doctoral dissertations from Swedish universities, and have an intended 
readership of specialists in the languages of medieval Britain and the near continent. 
They follow the practices of historical dictionaries, presenting dated, documented 
name-forms as evidence for etymological and other linguistic interpretations. Selection 
of names is governed by etymological criteria, and is to some extent systematic, focusing 
on names of one language of origin (Continental Germanic given names, for example) 
or of one formal or semantic category (Middle English compound nicknames, for exam- 
ple). Name-forms are excerpted from a defined set of records, usually for a single county 
or a selection of counties, but the inventories are rarely comprehensive, since inclusions 
and exclusions depend on personal editorial judgements about names that are formally 
ambiguous or obscure. Two regrets are that there are not more studies of this kind and 
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that, in identifying appropriate etymologies, attention is seldom paid to the personal 
contexts in which names arose and were used (see Clark 1995). The most recent of them 
(Insley 1994) isa model of linguistic and onomastic method, in which philological argu- 
ment is combined with a precise attention to scribal and dialectal variation and to the 
social and personal contexts of the name-bearers. 

Most of the monographs take the form of alphabetical dictionaries, but Fransson 
(1935) and Thuresson (1950) are etymological thesauri, in which occupational bynames 
are grouped semantically according to types of occupation (for example, ‘Cloth 
Workers’) and then into sub-categories (‘Flax-dresser, Comber, Carder, ‘Spinner, 
Roper, and so on). Alphabetical indexes of the Middle English head forms are provided. 
The usefulness of this approach for lexicologists and historians is self-evident, in spite 
of some unavoidable guesswork in defining senses for un-contextualized, formally 
ambiguous names. 

Here also belong a small number of corpus-based glossaries which stand in the same 
relationship to editions of onomastic records (like tax rolls or guild membership rolls) as 
word glossaries do to editions of literary texts. The most recent example is an edition of the 
Durham Liber Vitae (Rollason and Rollason 2007), where the given names are etymolo- 
gized alphabetically within categories of linguistic origin or transmission, and the bynames 
within one of five semantic types (nicknames, occupational names, relationship names, 
topographical names, and toponymic names). An index relates every name form to its tex- 
tual occurrences, etymological entry, and prosopographical entry (where there is one). 

Such collaboration by a team of historians, prosopographers, and etymologists 
is a rare acknowledgement of the interdisciplinary nature of the best anthroponymic 
research. Unfortunately, the esoteric nature of this edition and of the monographs 
mentioned above means that they are consulted by few editors of modern anthropo- 
nymic dictionaries. Two exceptions are Reaney’s Dictionary of British Surnames (DBS, 
1958), which made good use of the monographs that had appeared by the early 1950s, 
and Hanks’ and Coates’ Dictionary of Family Names in Britain and Ireland (FaNBI, 
forthcoming). 


16.4 SURNAME DICTIONARIES 


Unlike given name dictionaries, the contents of most surname dictionaries derive to a 
significant degree from the research of their editors or editorial teams. Standard works 
of this kind exist for all the countries of the UK and Ireland: Reaney’s DBS, expanded 
by Wilson as a Dictionary of English Surnames (DES, 1991); MacLysaght’s Surnames of 
Ireland (SI, 1985); Black’s Surnames of Scotland (SoS, 1946); the Morgans’ Welsh Surnames 
(WS, 1985); the Rowlands’ Surnames of Wales (SW, 1996, 2013); and Hanks’ and Coates’ 
forthcoming FaNBI. The benefits of local research in family names are especially evident 
in a number of smaller scale dictionaries like Kneen’s Personal Names of the Isle of Man 
(PNIM, 1937), Bells’ The Book of Ulster Surnames (US, 1988), and Redmonds’ Dictionary 
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of Yorkshire Surnames (DYS, forthcoming in 2015). Dictionaries of continental surnames 
written in English are rare: Cassar (Surnames of the Maltese Islands, 1993) deals compre- 
hensively with over 1,000 surnames of the Maltese Islands, and for the American mar- 
ket Beider (Dictionary of Jewish Surnames from the Russian Empire, 1993, Dictionary of 
Jewish Surnames from the Kingdom of Poland, 1996) has published monumental surveys 
of tens of thousands of Jewish surnames in Russia and Poland. Since the quality of the 
research methodology in these works affects the quality of the lexicography, this will 
form a significant topic for later discussion. The remaining dictionaries to be mentioned 
are largely dependent on material in the research dictionaries but can include signifi- 
cant numbers of new entries and explanations: Cottle’s Penguin Dictionary of Surnames 
(PDS, 1967, 1978); Titford’s Penguin Dictionary of British Surnames (PDBS, 2009), 
a revision of PDS; Hanks’ and Hodges’ Dictionary of Surnames for Oxford University 
Presss (ODS, 1988); Dorward’s Scottish Surnames (SS, 2003), and Hanks’ Dictionary of 
American Family Names (DAEN, 2003). 


16.4.1 Size and Range of Inventories 


Most users of surname dictionaries are looking for explanations of their own family 
names. If your name is Irish, Manx, Scottish, or Welsh, you are almost certain to find it 
in one of the specialist dictionaries devoted to them. MacLysaght’s SI, for example, lists 
over 4,000 names, including spelling variants, and claims to include practically every 
known Irish surname except for ‘recent introductions. If your name is of English origin 
you may not beso lucky. The English name stock is much larger, has a more complex his- 
tory, and is more difficult to research. Reaney’s and Wilson's DES, which contains more 
names than any other UK dictionary, provides explanations for over 27,000 names and 
variants, mostly English but including some Scottish, Irish, Welsh, and Manx names 
(see Tucker 2008: 15). This is considerably more than the 16,000 claimed by the publish- 
ers on the dust cover, but excludes thousands of long-established English names that are 
still in use. The size of the exclusion can only be guessed at from data gathered from elec- 
toral rolls and censuses, showing that there are more than 370,000 different surnames in 
Britain today, of which over 300,000 are names known to be of recent (mostly twenti- 
eth-century) immigrants (Hanks et al. 2012: 37). Even if a substantial proportion of the 
remaining 70,000 or so also prove to be non-English in origin, it is clear that no diction- 
ary of English surnames provides anything like a comprehensive coverage. FaNBI con- 
tains about 46,000 names, including many belonging to recent immigrants. Of course, 
most of the missing names are uncommon, but then the majority of UK surnames are 
borne by relatively few people, while only a smal] percentage are borne by many; one 
estimate suggests that the names in DES cover as much as 77 per cent of the population 
recorded in the 1998 electoral roll for Britain (see Tucker 2008: 26). 


3 A useful historical survey of surname dictionaries can be found in Hanks (2009: 124-37). 
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Surname dictionaries seldom tell readers about the extent and rationale of their cov- 
erage. In a few, the inventory is explicitly corpus-based, including all names recorded 
in a specific set of historical documents, or at least selecting those that have survived to 
the modern day, or in some cases preferring those that are still ‘common’ (whatever that 
means). Similarly, terms like ‘rare’ and ‘recent’ are frequently used to justify particular 
exclusions but they are seldom defined. In DBS, DES, and most other dictionaries, there 
is complete silence on how the inventory was compiled. We can suspect that telephone 
directories were a major source and hope that selection was influenced by numerical 
frequency, but this was certainly not so in Reaney’s idiosyncratic selection of Scottish, 
Irish, and Welsh surnames. 

The exercise of personal editorial preferences can result in misleading inclusions 
and exclusions. Wilson added nearly 9,000 names (including variants) in his third edi- 
tion of Reaney’s work, but around 3,000 of them are ‘ghost names; which are recorded 
neither in the 1881 Census nor in the 1998 electoral rolls (Tucker 2008: 23-7). Some of 
them are evidently editorial inventions derived from medieval bynames with interesting 
etymologies but which probably never became hereditary surnames (Redmonds 2005: 
47). Most surname dictionaries, like DES, exclude names for which the editors have no 
explanations, regardless of how common they are. 

Three dictionaries by Hanks stand out for having selected names on rational criteria 
consistently applied to a systematic survey of current names. ODS is a comparative work 
containing nearly 70,000 entries for most of the major surnames of European origin, as 
well as for many rarer ones, that occur in English-speaking countries. According to the 
Introduction, the selection of entries was based primarily on frequency counts in repre- 
sentative telephone directories, noting all names with a frequency of over fifty subscrib- 
ers in British and European capitals and those with a frequency above twenty subscribers 
in selected regions of the UK and Ireland. All such names appear in the inventory, even 
if the name could not be explained, so long as the country and region of use were known. 
Inclusion of rarer names depended on the availability of useful and reliable informa- 
tion. Hanks’ DAFN and FaNBI pursue similar principles, The main difference in DAFN 
is the use of computer technology to analyse the representative samples of telephone 
subscribers, leading to the inclusion, for the first time in an English-language surname 
dictionary, of the frequency of each name within the total sample. FaNBI systematically 
includes all names with more than one hundred bearers listed in the 1997 electoral rolls. 
It provides the frequency of each name in 1997 and 2011, and (where relevant) in the 1881 
British census. 


16.4.2 Dictionary Structure 


Most surname dictionaries are self-referencing, every name appearing as a separate 
entry in alphabetical order, with some acting as main (or anchor) entries containing 
explanations and lists of variants, and others as variant entries with a cross-reference 
to the main entry. However, finding the name you want is not always straightforward, 
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DES frustrates the reader by sometimes locating variants only at the main entry without 
listing them again in their alphabetical place with a cross-reference. In WS, the main 
entry is often a Welsh etymological form such as Jeuan, and one may have to read a 
substantial essay on Welsh phonology before locating the target surnames (in this 
instance, Evans and Bevan, as well all the English equivalents like Jones that derive from 
the corresponding English name John). In US, only major variants appear as entries with 
cross-references, while minor variants are listed in an index of all names, keyed to page 
numbers. The international, comparative content of the entries in ODS is unsuited to 
self-referencing; instead there is a comprehensive index of all name-forms, keyed to the 
main entries, where all variants are grouped under formal categories such as cognates, 
diminutives, and patronymics. 


16.4.3 Spelling and Pronunciation 


Not many surnames have a standard spelling, so choice of main entry varies from dic- 
tionary to dictionary. The common, and for most readers probably the best, option is 
one of the more frequent spellings, though there may be reasons (perhaps etymological) 
for making exceptions. DES is on its own in (almost) regularly selecting as head form 
the first in the alphabetical sequence of variants, even when it is a rare (and occasion- 
ally an obsolete) one. Many dictionaries list variants at the main entry, but there is no 
agreed practice on ordering them. DES and PDBS, for example, list them alphabetically 
although, for editorial convenience perhaps, some of Wilson's additions in DES follow 
a Reaney DBS head form in non-alphabetical sequence. SS, on the other hand, seems to 
have no policy at all. 

Spellings are often an unreliable guide to surname pronunciation, due to the irregu- 
larities and ambiguities of standard English orthography, the absorption of non-English 
surnames into English-speaking communities, the fossilization of older spellings in 
many modern surname forms, and the dialectal character of much surname usage. 
Nevertheless most dictionaries largely ignore pronunciation, probably mainly for lack 
of information on either current or historical usage. Kneen’s PNIM is unusual in provid- 
ing phonetic transcriptions. Some dictionaries (WS, SoS, SS, PDBS, FaNBI) offer occa- 
sional guidance, for example on some of the names with -ow- (like Howell, Powell, Thow, 
Bowie, Bowser, and Dowland), and on the modern unhistorical final stress in some of 
the names ending in -ell (like Waddell and Mantell). 


16.4.4 Origins, Etymologies, and Meaning 


Most dictionaries provide useful historical introductions that reflect the editor's inter- 
ests and skill sets. The appropriate context for explaining the origins of surnames is a 
combination of linguistic history and local and family history, but some editors of sur- 
name dictionaries are English etymologists with little knowledge of genealogy or family 
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history (Reaney and Cottle, for example). Some, like Titford, are genealogists or family 
historians with a weak grasp of etymology. Others, usually Celticists, show some knowl- 
edge of both subjects (Kneen, Black, MacLysaght, and the Morgans), but not enough 
of one or the other to avoid significant errors, especially at the interface of Celtic and 
English names. 

Homonymy is a more pervasive and more challenging problem in surname diction- 
aries than in any other area of lexicography. Lack of defining contexts for medieval 
bynames or surnames hinders the establishment of true etymological and semantic 
identities (see McClure 2013). Lack of genealogical information is similarly troublesome, 
especially in the post-medieval centuries, when surname identities became increasingly 
confused by divergence and convergence of name-forms, much of it unpredictable. I cal- 
culate that more than 60 per cent of the explanations in DES are probably inaccurate for 
these reasons. Surname explanation is a treacherously difficult field, bedevilled by too 
little and too much data, much of it obscure, and so under-researched that it is impos- 
sible at present for any dictionary of British, Irish, or American surnames to escape 
publishing a great deal of misinformation if its inventory ventures beyond the com- 
monplace. One feels, nevertheless, that some of them could have tried harder to keep 
abreast with research that corrects errors in the standard reference works (for example 
McKinley 1977, 1981, and 1988; Redmonds 1997; McClure 1998, 2003, 2005; Hey 2000; 
Tooth 2000-10). There is much copying from one work to another, and long-disproved 
etymologies continue to appear in the latest dictionaries. Some of Reaney’s explanations 
of Irish, Welsh, and Jewish names are deficient and distorted due to anglocentrism (see 
Hanks et al. 2012: 40). Many of Wilson’s additions to Reaney’s work in DES are etymo- 
logically unsound. Problems like this are more likely to occur when editors work sin- 
gle-handedly. The collaborative approach of the Hanks dictionaries (ODS, DAFN, and 
FaNBI), using expert consultants in the specialist linguistic areas, is an important step 
in the right direction, as is his employment in FaNBI ofa team of editors, including ety- 
mologists and historians. 

The disambiguation of surname homonyms can only be achieved by a comparative 
methodology, enabling variant forms of a single name to be established in their local 
contexts across a range of medieval and modern records (see McClure 2013). The FaNBI 
project has made significant progress towards this goal by searching a selection of the 
tax records and parish registers that are easily accessible in print or online (Hanks et 
al. 2012), but this leaves much relevant data unexamined. Before any dictionary of UK 
surnames can offer definitive explanations for the majority of the native name stock, 
there need to be comprehensive investigations of family names in the local records of 
every county. To date only Yorkshire (Redmonds DYS) and north Staffordshire (Tooth 
2000-10) have received anything approaching that. 

Even when a plausible etymon has been identified, the meaning of a name can often 
only be guessed at, since the contexts of the original usage are unknown. The problem 
affects all classes of surname to some degree but nicknames most of all, and diction- 
ary editors can be reluctant to acknowledge the uncertainty of the etymologies and 
meanings they propose. Hanks and Hodges are an exception: ‘we have not been afraid 
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to speculate using words like “probably” and “possibly”, knowing that more detailed 
research may yet prove our speculation wrong’ (ODS: xiv). In spite of this policy, there 
remains a great deal of unacknowledged guesswork there, unsupported by evidence. 
One piece of speculation that needs researching more fully is Reaney’s explanation of 
names like Bolt and Hood as metonymic occupational names, synonyms of Bolter ‘bolt- 
maker’ and Hooder ‘hood-maker. The unquestioned assumption that ‘metonymic 
occupational names (e.g. Cloke for a cloak seller)... were common in Middle English’ 
(Hanks 2009: 130) is frequently relied on in surname dictionaries, but the evidence for it 
is slight and ambiguous. Names apparently derived from nouns denoting products are 
better explained as nicknames, which may sometimes have arisen in occupational con- 
texts but could equally well reflect some other association (for example the distinctive 
wearing of ahood or a cloak; see McClure 2010: 214-15). 

Surnames are normally categorized in four or five broad semantic types, which are 
used by some dictionaries as explanatory labels. In PDS and PDBS, they are reduced to 
the initial letter of each label; in PNIM three main types and seven subtypes are indi- 
cated by roman numerals. DAFN and FaNBi attempt to refine their classifications with 
additional sub-categories, but no labelling is altogether satisfactory, partly because differ- 
ent scholars use different terminology and definitions, but mainly because no system of 
classification can accurately register the polysemous or ambiguous nature of many early 
surnames. 

In Hanks’ dictionaries every sense entry begins by defining the culture (Jewish, 
Huguenot, Norman, Chinese, for example) or the language and/or culture of the sur- 
name (using terms like English, Scottish, German, Italian, which can apply to either). 
This is often a helpful signpost for the non-specialist reader, and it works reasonably well 
in ODS and DAFN, where the terms mostly point unmistakably to a particular country, 
region, or culture of origin. It is more problematic in FaNBI, where terms like Irish or 
Welsh can be ambiguous when applied both to names of Irish or Welsh etymology and 
to English-language names that may have originated in Ireland or Wales or have had a 
long presence there from English migration. 

Alternative senses are sometimes given as numbered etymological options. In DAFN 
they are sometimes accompanied by diagnostic forenames, an innovative method of 
identifying the degree to which an American surname can be attributed to alternative, 
linguistically differentiated etymologies and countries of origin. In DES and FaNBI 
numbered senses are usually attached to discrete sets of early name-forms (see Section 
16.4.5), but Reaney and Wilson can often be shown to have assigned ambiguous forms to 
the wrong etymon, and the same will prove true for many entries in FaNBI when further 
research is done. 


16.4.5 Presentation of Onomastic Evidence and Etymologies 


In ODS, DAFN, SI, and US, definitions are given without citing onomastic evidence; 
PDS and PDBS also dispense with etymons. Most research-based dictionaries, however, 
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usually cite dated, documented name-forms, and rightly so, contrary to the surprising 
claim in ODS (p. xv} that their value is overstated. 

There are two styles in research-based dictionaries, one broadly based in local and 
family history, and the other philological, where the emphasis is etymological. Generally 
speaking the local history approach, where the relevant name-forms are embedded 
within a discursive explanation, tends to be more intelligible to the ordinary reader 
than the philological. There are some beautifully written entries in Black’s SoS and 
Redmonds’ DYS, for example, where the name-forms and the people bearing them form 
part of a coherent historical narrative tracing the origins and developments of family 
names in particular localities. The philological approach derives largely from the lexi- 
cographical practices of word dictionaries like the OED. Thus in DES, PNIM, WS, and 
FaNBI the name evidence appears in a separate text block, attached to a particular sense 
or explanation. The implication is that the names in each block justify the proposed ety- 
mology and link that etymology to the modern surname. However, whereas in OED 
word usages are cited within verbal contexts that help to define the sense, in DES and 
FaNBI the surname citations rarely have any defining contexts, apart from references to 
time, place, and document. The assignment of name-forms to one sense or another in 
discrete blocks is therefore often arbitrary, a fact explicitly recognized and allowed for 
in many FaNBI entries. As an explanatory device, however, it is often too rigid to deal 
effectively with the actual complexities of surname history, and can produce ill-focused 
compilations of citations, of uncertain relevance to the name in question. Nevertheless, 
since the research that underpins the local history approach to surname explanations 
has been done for relatively few names or counties (especially in England), the philo- 
logical approach is currently the only viable one for a general dictionary that cites early 
name-forms. 

In DES and FaNBI the presentation of early name-forms follows the common lex- 
icographic practice of listing items in date order, regardless of the geographical loca- 
tion of the name-forms. In the light of what is now known about the local nature of 
surname—or rather family name—development and variation (see Section 16.4.7), it 
would be better practice to organize the data initially by place (region, county, or parish 
according to the needs of the explanation) and within that by date order, to show how 
the family name developed linguistically over time in the places where it ramified. 

A further problem with the philological style of explanation arises from heavy abbre- 
viation of documentary references which, if over-used, can produce an elliptically dense 
text, such as this from WS in the entry for Bychan, fem, Bechan (Vaughan etc.), where 
an undiscriminating typography and punctuation contribute to the opacity: 


B13. 216, SR 1292. Phillip ap Adam Waghan; Gorgeneu Vaghan. 226. Iuan vochan ap 
Donewal. 228. Iuan Vachan; Iuan ap Iuor Vochan. B15. 140. NW Edw. I. Yereward 
Vahham. Ang]Pleas 34. T. Wyn ap Gruffith Vichan. 


Other instances of scholarly telegraphese occur in appositional lists of linguistic ety- 
mons, often from different languages, with no explanation as to whether they are 
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cognates or a historical sequence of loans from one language to another. This is a com- 
mon practice in DES. For the English surname Asser Reaney offers: 


QN ™zurr, ODa, OSw Azur, Welsh Asser. 


For all its scholarly apparatus, DES is aimed at a general readership, which cannot be 
expected to make much sense of such linguistic shorthand. The elliptical explanatory 
style has an authoritative appearance but is often inadequate in content and can be mis- 
leading (the surname is not Welsh). 

As in the entry for Asser, Reaney and Wilson regularly omit the Middle English ety- 
mon from which a surname is actually derived, if antecedent etymological forms can be 
cited. In the following entry the appropriate etymology would be ME botwright ‘boat 
maker’: 


Boatwright, Botwright: John Botwright 1469 SIA xii; John Botewrighte 1524 SRSf. OE 
bat ‘boat’ and wyrhta ‘wright’ 


In this and countless other instances our attention is diverted away from the language 
in which the surname was formed (Middle English) towards the antecedent languages 
from which Middle English inherited its vocabulary and name stock. It is a skewed 
focus, which has led to some questionable practice when giving glosses for archaic per- 
sonal names that are the etymons of much later surnames. The DES gloss for Gunter is a 
typical example: 


OFr Gontier, OG Gunter ‘battle-army’ 


Many readers like the extra information, but if learned editors condense it in formu- 
laic summaries, with no chronological or cultural contextualization, they risk readers 
confusing the etymology and sense of the surname with those of its source name. The 
practice also occurs in popular dictionaries. No editor is more elliptical than Cottle (in 
PDS, where F stands for ‘First name’), who unhelpfully condenses Reaney’s gloss even 
further to 


F ‘battle army’ Germanic. 


This is hardly intelligible as an explanation of the English surname. In PDBS Titford 
attempts to make amends by expanding Cottle’s gloss to 


F From a Norman first-name meaning ‘battle-army’ 
but misrepresents the Germanic personal name as a meaningful compound (see 


Section 16.2.6). Only Hanks (ODS) accurately re-interprets Reaney’s original by keep- 
ing the etymologies of the English surname and the Norman (Old French) personal 
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name distinct and by correctly implying that the personal name was an archaic arbi- 
trary compound: 


English: from the Norman personal name Gunter (OF Gontier), composed of the 
Gmc elements gund battle + heri, hari army. 


‘The succinct clarity of the explanation is a great improvement and is characteristic of 
Hanks’ dictionaries, but it remains a moot point whether the etymology of the personal 
name enriches readers’ understanding of the surname or confuses it. I have similar mis- 
givings about giving etymologies for place names that were no longer lexically meaning- 
ful at the time when hereditary surnames were coined from them, even when it is done 
with the lucidity of ODS and DAFN. 


16.4.6 Selection of Onomastic Evidence 


Name-forms are chosen in order to justify an etymology or to show continuity and 
change of name-form or spelling over time. The selections in DBS and DES raise impor- 
tant methodological issues relating to the use of medieval and post-medieval evidence. 
Although Reaney’s collection of medieval data is extraordinarily large for a single- 
handed, manually transcribed archive, it is too small, too disparate, and insufficiently 
contextualized to provide accurate etymologies for many of the names he attempts to 
explain. Medieval bynames seldom occur in contexts that point unambiguously to the 
lexical identity and meaning of the name (McClure, forthcoming). ‘They frequently 
appear in spellings which are phonologically ambiguous, or they may be unreliable 
because of scribal or editorial mis-copyings and misunderstandings. Similar spellings 
may represent entirely different names in different places. It makes sense, therefore, to 
collect alternative forms or spellings of the same name in the same community, if pos- 
sible in those localities (the counties, at least) that are relevant to the distribution of the 
modern surname. It is also important to take into account any information about the 
familial, occupational, socio-economic, and biographical (or prosopographical) con- 
texts of the name-bearers. This locally focused, person-centred approach, relevant to 
all types of medieval byname, was successfully pioneered by Ekwall (1951), but had little 
impact on Reaney’s methodology or on that of most of the scholarly, dictionary-style 
monographs on Middle English given names and bynames that have since been pub- 
lished (see Clark 1995: 110-13). They treat the name as ‘word’ rather than ‘person, as ‘a 
manifestation of linguistic form rather than social life (McClure 1981b: 101). Reaney’s 
dictionary method, taking isolated name-forms from counties geographically distant 
from each other and unrelated to modern distribution, was bound to produce many 
mistaken etymologies. 

‘The paucity of post-medieval name evidence in Reaney’s dictionary is a further 
sign that he ignored the local nature of family name history and especially the pro- 
found impact that folk etymology and dialect pronunciation had on the development 
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of surname forms during the five hundred or more years since most surnames became 
hereditary (particularly the growth in homonymy discussed in Section 16.4.4). Reaney’s 
was not designed as a dictionary of current family names, but as a collection of medi- 
eval bynames linked, often more by guesswork than by evidence, to modern surname 
forms. Similar problems are less frequent in the Scottish and Manx dictionaries, where 
post-medieval name-evidence and attention to the local development and distribution 
of family names are essential underpinnings of the explanations offered. 

In this respect as in many others, FaNBI represents a substantial advance on Reaney’s 
methodology. The foundation of the database is DES, which has been radically revised 
and expanded by the addition of thousands of dated name-forms from published and 
unpublished records, especially (but not exclusively) post-medieval. The majority of 
the name-forms have been chosen specifically to illustrate the historical presence of the 
surname in its relevant locations in the medieval and the modern periods. However, 
within the dictionary project’s funding period (2010-14) it proved impossible to revise 
all entries with this degree of fullness, so there is considerable unevenness in the quality 
and quality of the onomastic evidence between different entries. 


16.4.7 Family History and Surname Geography 


It is not the business of surname dictionaries to provide family histories or genealogies, 
but they need to draw on them in order to explain surname history. Family ramification 
and migration are a principal subject matter of dictionaries like SI, US, and DYS. Some 
popular dictionaries (ODS, DAFN, and PDBS) offer short biographies of famous bear- 
ers of a surname, some of which are of little relevance to the explanation, while others 
provide information about family migration or illustrate the name's characteristic geo- 
graphical distribution. This kind of information is rare in DES. 

The systematic association of English family names with localities was first proposed 
by Guppy (1890). It is now universally recognized as essential information in estab- 
lishing surname origins, and surname mapping has become an important research 
tool, alongside information from family history, local history, and the new discipline 
of historical genetics (Hanks 1992-93; Hey 2000; 135-60; Redmonds et al. 2011: 84-105, 
148-93). Recent research has shown that most UK surnames have remained in the 
same geographical area over many centuries, and that large numbers of them probably 
derive from a single medieval progenitor.* This has profound implications for estab- 
lishing reliable etymologies. Reaney (unlike Cottle) disregarded Guppy, supposing 
that a linguistic similarity between a medieval and a modern name was sufficient to 
produce a plausible menu of explanations, with little need for distributional or gene- 
alogical information. The extent of Reaney’s error was exposed with telling detail in 
Surnames and Genealogy (Redmonds 1997), which drew on a lifetime's study of family 
names and local history in the West Riding of Yorkshire. His general findings are borne 


4 Fora summary view see Redmonds et al. (2011: 215-17). 
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out elsewhere, in county studies by Tooth (2000-10) and in the English Surname Series 
(McKinley 1988, for example); in country-wide investigations by Rogers (1995) and 
Hey (2000); and in the researches of some members of the Guild of One Name Studies, 
such as Rubery (2011). 

Rogers’ and Hey’s work developed in part out of the publication in the 1990s of 
CD-ROMs of British telephone subscribers and an index of the 1881 Census Returns, 
which made surname distribution in Britain much easier to plot. In ODS and DAFN, 
statistical samplings from British, Irish, European, and American telephone directories 
are the source of occasional distributional information. The only other UK-based dic- 
tionaries to incorporate this kind of information are Titford’s PDBS, Hanks’ FaNBI, and 
Redmonds DYS. Their source is the British 19th Century Surname Atlas (Archer 2003), 
a CD-ROM which maps the distribution and frequency of every surname recorded ina 
digitized transcription of the 1881 Census data. It enables Titford to offer some improved 
derivations of toponymic surnames and Redmonds to trace the origins and migrations 
of names into and out of Yorkshire. It has also pointed the way to many new explana- 
tions of names in FaNBI, which unlike PDBS provides information on frequency and 
distribution for every name. Surname mapping also enables more accurate attribution 
of variants than Reaney’s guesswork could achieve, Some of Reaney’s groups of variant 
forms prove not to be variants at all but are independent coinages or belong to other 
names. The ideal dictionary would include a map for every name, and some may appear 
in the online edition of FaNBI. For lack of any in the printed edition, FaNBI summarizes 
the main location(s) of each name in words, by no means an easy task, since distribu- 
tions can be complex and difficult to generalize. The brief verbal descriptions neverthe- 
less provide valuable information not to be found in any other current dictionary. 


16.4.8 Online Data and Online Dictionaries 


The electronic digital revolution is transforming surname research. The increasing 
availability on disk and online of searchable primary sources and indexes from the 
medieval period to the modern (particularly census returns, parish registers, wills, and 
tax records) will also have a major impact on the content of future surname dictionar- 
ies, as will the computational power to analyse and map the databases. This is already 
evident in FaNBI, which draws on a number of digitized sources, including an unpub- 
lished electronic database of Fenwick’ edition of the fourteenth-century Poll Taxes, 
the pre-2013 International Genealogical Index (validated church register transcripts 
only), the British censuses, Patent Rolls, and some National Archives abstracts. How 
best to organize and trawl such vast quantities of data will be a significant challenge for 
researchers and lexicographers., Several family history websites provide access to some 
of these sources, and some of them offer surname etymologies taken from dictionaries 
and from surname distribution maps based on one of the nineteenth-century censuses. 
Online dictionaries look like an ideal format for the future. They can easily encompass 
inventories too large for modestly-priced print dictionaries; they offer the possibility 
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of rapid updating in the light of new research and reviewers’ corrections; and they can 
provide links to related websites. The opportunity will be seized by amateur enthusiasts 
and by commercial organizations other than mainline publishers. Already one free web- 
site (<www.surnames.behindthename.com/>) provides a moderately-sized inventory 
of surnames (worldwide but mostly English) but the selection is unrepresentative and 
many of the etymologies are ill-informed guesswork, a situation which will probably 
worsen as further expansion will be through submission of voluntary contributions. 
Hanks’ FaNBI is a major online dictionary, to be made available by subscription 
alongside a hard copy edition by Oxford University Press.° With its greater coverage 
(it includes names of recent immigrants and explains many native UK names for the 
first time), its many improved etymologies, and its information on frequency and dis- 
tribution, FaNBI will replace DES as the standard dictionary of surnames in Britain and 
Ireland. It is a ground-breaking achievement, ifan uneven one. It leaves much work still 
to be done in revising the unsatisfactory explanations in DES and in finding plausible 
etymologies for previously unexplained names. This was anticipated, and its database 
has been designed to enable further research and revision, as funds become available. 


16.5 DICTIONARIES OF NICKNAMES 
AND PSEUDONYMS 


Nicknaming is linguistically, sociologically, and anthropologically a varied and fasci- 
nating phenomenon, but its suitability for lexicographic treatment is restricted by its 
ephemeral and idiosyncratic nature and by its tendency to flourish in the private, unof- 
ficial cultures of closed social groups, few of whose linguistic and onomastic codes have 
been systematically recorded.® Middle English nicknames are a partial exception, in that 
they are abundantly recorded and survive in great numbers as hereditary surnames (see 
Section 16.3). 

There are two kinds of nickname that appear in popular dictionary form, generic 
nicknames and the individual or personal nicknames of celebrities and social groups. 
Some generic names (like Ginger for a red-haired person or Paddy for an Irishman) have 
remained in common currency for long periods of time, but most of them are so ephem- 
eral that the only dictionary mainly devoted to them, Franklyn’s Dictionary of Nicknames 
(1962), consists largely of names that are now obsolete. He derived most of them from 
his own experience of the armed services or from his reading of dialect and slang dic- 
tionaries. In content and methodology it is in fact a sub-category of popular dictionaries 


5 FaNBIisa main outcome of the Family Names Project, based at the Bristol Centre for Linguistics, 
University of the West of England. The project is led by Patrick Hanks and Richard Coates, and has 
received two grants from the Arts and Humanities Research Council of Great Britain, 2010-14 and 
2014-17. 

§ See McClure (19814). 
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of slang, and conforms to the characteristics described by Julie Coleman in chapter 19.’ 
Dictionaries of pseudonyms and of personal nicknames are only slightly more numer- 
ous. Rooms Dictionary of Pseudonyms and Freestone’s Harrap’s Book of Nicknames have 
more in common with biographical than lexical dictionaries, and the selection of names 
is personal rather than systematic. Delahunty’s two dictionaries, the Oxford Dictionary 
of Nicknames and its popularizing spin-off, Goldenballs and the Iron Lady, contain a 
not entirely coherent or well-organized miscellany of celebrity names, generic names, 
and nicknames of well-known groups and businesses, such as army regiments, football 
clubs, and retailers. Idiosyncratic choices and a relaxed approach to layout, content, and 
indexing are not unexpected in dictionaries primarily designed to entertain, but there is 
no reason in principle why nickname dictionaries cannot be comprehensive, rigorously 
researched reference works. Tyas’s Dictionary of Football Club Nicknames in Britain and 
Treland (2013) shows how to do it, systematically recording dated, fully-referenced nick- 
names of every professional and amateur club that the author could find in printed and 
online sources, and where possible offering explanations. 


16.6 THE FUTURE 


The interrelation of the linguistic and the personal aspects of anthroponymic history 
make it a complex but rewarding field of study and it is likely to become the subject of 
growing scholarly and public interest as the investigations of family historians, pros- 
opographers, and geneticists begin to shed more light on the origins of individuals and 
populations.* Anthroponymic dictionaries will need to draw on research in these other 
disciplines, which in their turn will look to given name and surname dictionaries for the 
relevant linguistic history. 

Other kinds of lexicography also need reliable anthroponymic dictionaries. Names 
are sometimes the source of new words, and names are usually coined from words or 
word elements, often at historically remote times, and so may antedate the earliest liter- 
ary example of a word or sense, or constitute the only evidence for a word's existence.” 
A library of the best anthroponymic dictionaries offers a patchwork of accurate and 
inaccurate information, or no information at all. 

Anthroponymic lexicography is an under-resourced and under-developed art. 
Names are not words, and the standard lexicographical practices of vocabulary dic- 
tionaries need re-thinking when they are applied to anthroponyms, where personal 
and family history is entwined with linguistic history. New digital methods of research 
will enable quicker searches of primary records and rapid computerized analyses of 
large amounts of data, but these can significantly improve the extent and quality of the 


? See also Coleman (2009b). Franklyn himself edited a Dictionary of Rhyming Slang (1960). 
8 See Redmonds etal. (2011). 
9 For the methodology and illustrative examples see McClure (20114, 2011b). 
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lexicography only if sufficient numbers of scholars with appropriate etymological and 
historical skills can be trained up to do the basic research. Online dictionaries have 
much to offer. Free access is bound to be more attractive to the user than access by sub- 
scription, and we must hope that academic and reputable commercial publishers will 
not allow the internet to be dominated by self-publishing amateurs or by sites soliciting 
inexpert user contributions. 


CHAPTER 17 
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PRONOUNCING 
DICTIONARIES 
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CATHERINE SANGSTER 


17.1 INTRODUCTION 


THE function of a pronouncing dictionary, or pronunciation dictionary, is to help read- 
ers to speak the words it lists accurately. Because each entry is quite brief compared with 
those in a general dictionary, specialist pronouncing dictionaries can include long lists 
of headwords in a small volume. In printed dictionaries, guidance about the pronun- 
ciation is given in the form of a transcription. In some cases this is complemented by 
the provision of sound files on an accompanying CD-ROM. Pronunciation dictionaries 
with audio are also available online. 

This chapter will offer an overview of historical and contemporary pronouncing dic- 
tionaries, and explain who uses them and why. Pronunciation lexicographers’ decisions 
about which words to include, which pronunciation or pronunciations of those words to 
select, and which accent or accents to reflect are considered, and transcription methods 
are discussed and illustrated. 

Not all languages call for a comprehensive pronouncing dictionary. Polish, 
Hungarian, or Spanish, for example, have regular, predictable syllable stress and a sim- 
ple one-to-one relationship between orthography and pronunciation. All that a speaker 
requires to pronounce every word originating in such languages accurately is knowl- 
edge of the relevant phonological rules and alphabet (including accents and diacritics). 
Guidance would only be needed for unusual items such as loanwords and place names. 

The English language’s relationship between orthography and pronunciation is far 
from simple, and its stress patterns are difficult to predict, so it is among those languages 
for which pronunciation dictionaries are of great value for native and non-native speak- 
ers alike, Although examples of pronouncing dictionaries from other languages will be 
discussed at various points, this chapter will chiefly use examples from three contem- 
porary pronouncing dictionaries of English to explore some general methodological 
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issues. The three titles which will be referred to in detail are the Longman Pronunciation 
Dictionary (LPD), the Cambridge English Pronouncing Dictionary (EPD), and the 
Oxford Dictionary of Pronunciation for Current English (ODP). Each of these pronounc- 
ing dictionaries uses the symbols of the International Phonetic Alphabet (see Section 
17.9) to supply more than 200,000 transcriptions of British and American English 
pronunciations. 


17.2 HISTORY OF PRONOUNCING 
DICTIONARIES 


TAA p ete eeP Asana EDEN OR DE OE ESE SEDER SEDI DER DIOS FUPDERS SSRIS UTES Gum EAy SPAR APSA OP BEBEBODSEBPE APA PEDIGPEEUSIDERGEPEDEEDEREEBELSFES SESE IEE OEE EB IDES EOD SIDE ESIDEDSOED BE SED EDGED SED SERS 


Lexicographers wishing to communicate pronunciations to their readers need to employ 
a system of written representation. For some early lexicographers, this meant marking 
headwords with accents and syllabification (e.g. Walker's 1791 A Critical Pronouncing 
Dictionary and Expositor of the English Language). Others developed comprehensive 
systems of respelling (e.g. Murray 1888). Although Walker's dictionary, and that of the 
American pronunciation lexicographer Worcester (who produced his Comprehensive 
Pronouncing and Explanatory English Dictionary in 1830), can properly be regarded as 
pronouncing dictionaries as their titles suggest, they and other titles produced at the 
same time include more explanatory material and definition than contemporary pro- 
nouncing dictionaries of English do. For a fuller historical discussion of the work of 
these and other pronunciation lexicographers of the eighteenth and nineteenth centu- 
ries, see Beal and Sturiale (2012), Collins and Mees (2009), and, for an earlier and briefer 
perspective, Emsley (1940). See also the discussion in Pointon (this volume). For a thor- 
ough account of the pronunciation representation system developed by James Murray 
for the New English Dictionary, which was to become the OED, see MacMahon (1985). 
An important step forward for the modern pronouncing dictionary took place 
around the turn of the twentieth century, with the publication of Michaelis and Passy’s 
Dictionnaire phonétique de la langue francaise. This French dictionary was published in 
1897, the same year as the International Phonetic Association (I’'Association Phonétique 
Internationale) was founded, It was the first pronouncing dictionary to use the IPA’s 
alphabet, which has become a widespread, although not universal, standard for phonetic 
transcription (see Section 17.9). In Germany, Viétor’s Deutsches Ausspracheworterbuch 
(1912) also used IPA transcriptions, which continue to be used in Duden’s pronouncing 
dictionary, Das Ausspracheworterbuch (first edition Mangold 1962, sixth edition 1995). 
As for pronouncing dictionaries of the English language, the most significant 
moment was undoubtedly the publication in 1917 of Daniel Jones’ English Pronouncing 
Dictionary, which also adopted the use of IPA transcriptions (although Jones, with 
Michaelis, had produced a pronouncing dictionary of English four years earlier, Jones 
and Michaelis (Phonetic Dictionary of the English Language, 1913), which was in some 
ways a precursor of the later work). The current EPD, one of our three focal titles, is 
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the 18th edition of this same dictionary. It has developed over its long life, notably with 
Gimson’s phonemic transcription reforms in the 13th edition (1967), and the addition of 
American pronunciations in the 15th (1997), and continues to evolve as it approaches its 
centenary. As Collins and Mees observe in their academic review of the 17th edition, ‘for 
many years, the EPD had no real rival... until 1990 when John Wells brought out, as a 
one-man tour de force, the Longman Pronouncing Dictionary’ (2007: 213); their review 
provides a useful critical comparison of the two titles. 

The third current dictionary, ODP, entered the field in 2001, and differs in certain 
ways. ODP does not indicate breaks between syllables in its transcriptions. Each word 
is given a full pronunciation in British and American English, and it has a dedicated 
American editor, William Kretzschmar. For British English, ODP follows the Oxford 
dictionary model which uses different IPA symbols for five vowel sounds to those 
used in LPD, EPD, and elsewhere (these are discussed in Collins and Mees 2009 and in 
Pointon, this volume). 

In the United States, although some early pronunciation dictionaries of English were 
produced in the nineteenth century along the same lines as those in Europe, the two key 
titles did not appear until the middle of the twentieth. The first of these was a specialist 
dictionary aimed at broadcasters (see Section 17.5), the NBC Handbook of Pronunciation 
(Bender 1943). This gave pronunciations in both IPA and respelling transcriptions. 
Collins and Mees (2009: 195) describe it as ‘an unpretentious publication, with limited 
aims, [which] can claim credit for being the very first “home-grown” American pro- 
nunciation dictionary of the 20th century’ It was followed one year later by a larger and 
more general pronouncing dictionary, Kenyon and Knott's Pronouncing Dictionary of 
American English (1944). American English pronunciations are now routinely included 
in larger pronouncing dictionaries produced in Britain, including EPD, LPD, and ODP. 


17.3 WHO USES A PRONOUNCING 
DICTIONARY? 


Readers of pronouncing dictionaries are generally those who wish to identify the cor- 
rect (or, at least, a correct) way to say a word or phrase aloud, although the transcrip- 
tions are also useful to phoneticians and computational linguists. Native speakers look 
to pronunciation dictionaries to find authoritative information on unfamiliar or contro- 
versial words, to improve their own pronunciation, to educate or advise others, or to set- 
tle arguments. Native speaker users might include teachers of the language in question, 
broadcasters or performers, public speakers; anyone who wishes to speak with confi- 
dence that their pronunciation choices are backed up by a published source. 

Non-native speakers use general pronouncing dictionaries in broadly the same way 
as native speakers; to support their learning and to increase the accuracy of their pro- 
nunciation of the relevant language. Depending on their level of fluency and awareness 
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of the language’s pronunciation patterns, a non-native speaker might wish to look up a 
wider range of words than a native speaker, who might focus more on the complicated, 
controversial, or unusual words. 

Some pronouncing dictionaries of English have been prepared with a non-native 
speaker readership in mind. Windsor Lewis's Concise Pronouncing Dictionary of British 
and American English was ‘planned solely for the benefit of users of English as a for- 
eign or second language’ (1972: vi) and Wynn's Spoken English (1987) has the same aim. 
EPD, LPD, and ODP, however, are aimed explicitly at a readership of both native and 
non-native speakers, as their respective introductory matter makes clear: 


{This is] a classic work of reference, both for native speakers of English wanting an 
authoritative guide to pronunciation, and for users of English as a foreign or second 
language all over the world. (EPD: iii) 

{This dictionary] is intended for use both by fluent English speakers and by learn- 
ers of the language. On the one hand, it will provide those possessing a high degree of 
competence in English with a guide to the pronunciation of those uncommon words 
with which they may be unfamiliar and whose pronunciation may not immediately 
be apparent. On the other hand, it will give the English-language learner a compre- 
hensive guide to the pronunciation of the core vocabulary. (ODP: ix) 

Some of the users of LPD will be teachers and learners of English as a for- 
eign or second language, and will look for advice on how to pronounce a given 
word....Other users of LPD, especially those who are native speakers of English, 
will be interested not only to see what form is recommended, but also what variants 
are recognized. (LPD: xvii) 


Striking a balance when it comes to entry selection, to serve the needs of both groups of 
users, is a particular challenge for editors of major pronouncing dictionaries. The mate- 
rial chosen must be broad and complex enough to interest the expert native speaker 
with a large vocabulary, without being so arcane as to alienate other users. 


17.4 FORMAT 


The usual format fora printed pronouncing dictionary is to list the entries alphabetically 
by headword. Specialist pronouncing dictionaries (see Section 17.5) or glossaries might 
adopt this approach within each of a set of themed sections, for instance a medical pro- 
nouncing dictionary with separate alphabetical lists for names of drugs, diseases, parts 
of the body, etc., but more general pronunciation works have an overall A to Z structure. 

Jones and Michaelis’ 1913 Phonetic Dictionary of the English Language followed the 
1897 Dictionnaire phonétique de la langue francaise in the unusual practice of offering 
the phonetic transcription before the headword. This was for the benefit of readers who 
wished to look up words that they had heard rather than read, and who were there- 
fore more familiar with the way words sounded than with how they were spelt. More 


296 CATHERINE SANGSTER 


recently, a dictionary aimed at poets and songwriters, the Oxford Rhyming Dictionary 
(Upton and Upton 2004), was created by inverting a pronunciation dictionary’s cor- 
pus of transcriptions to sort words into rhyming groups rather than alphabetically. The 
norm, though, for a conventional pronouncing dictionary is to order by headword. 

The method of representing the pronunciation in written form, whether by annota- 
tion, respelling, or transcription, must be explained to users. Pronouncing dictionar- 
ies generally set out their approach in the front matter, explaining the model used and 
illustrating by means of a table or key. Many titles also provide the pronunciation key 
in a second format, since the user is likely to make frequent reference to it. One recent 
edition of LPD came with a laminated card on which the pronunciation chart was repro- 
duced; it and EPD also provide a chart in the flyleaf so that users can flick back to it 
easily. Another technique to assist users first noted in Walker (A Critical Pronouncing 
Dictionary and Expositor of the English Language, 1791) was the use of a keyline, list- 
ing symbols and example words along the bottom or top margin of each double-page 
spread, a ‘helpful device whereby the key words are not buried in the prefatory pages 
but carried out and displayed on every page of the dictionary proper, where the general 
reader will notice them’ (Emsley 1940: 56). 

General dictionaries often supply pronunciation transcriptions alongside their defi- 
nitions, word classifications, and other lexicographical information; pronunciations as 
they appear in such dictionaries are discussed in more detail in Pointon (this volume). 
By contrast, modern pronouncing dictionaries’ entries are briefer, and typically consist 
of nothing but the headword and pronunciation or pronunciations, with perhaps the 
addition of occasional editorial notes pointing out proprietary status on trademarks, or 
commenting on particular curiosities. 

It isnecessary, however, to supply more than just the headword for the purpose of dis- 
ambiguating heteronyms (homographs—wind, tear, bow, unionized etc.—which are not 
homophones). A pronouncing dictionary listing words such as contract, record, incense 
would need to indicate the part of speech for each pronunciation, since in each case the 
stress falls on the first syllable of the noun but the second syllable of the verb. In some 
entries, the part of speech would not suffice and other descriptive comment is required. 
For example: 


Dominican of the D~ Republic’; religious do 'mm tkan... 


Dominican of Dominica’ dom t'nitkan... LPD: 243 
axes! plural of axis... axes” plural of ax, axe ODP: 74 
sake cause, purpose... sake drink EPD: 451 


The usual contemporary practice for full homonyms (homographs which are also hom- 
ophones) is to give one entry with no additional information. This aspect ofa pronounc- 
ing dictionary’s format means that the balance of its wordlist differs significantly from 
that of a general dictionary. It also means potential difficulties for users seeking a pro- 
nunciation for a homographic word too arcane for the lexicographer to have included, 
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since they could reasonably assume that the pronunciation given was correct for all pos- 
sible senses of the word. 

In our focal group of EPD, LPD, and ODP grave supplies an example of this. EPD 
offers two pronunciations of this word, one glossed as ‘accent above a letter’, which is 
/ gra:v/, the other ‘other senses, which is /greiv/. Although this would be the correct 
pronunciation in the obvious ‘other senses’ of a place of burial n. and solemn adj., the 
musical term grave, loaned from Italian, is not pronounced in this way, but rather as 
/ ‘gra:vel/. In LPD, this third pronunciation is supplied in a separate entry. ODP also 
omits the musical pronunciation, but does specify that the three separate pronuncia- 
tions it gives are for accent, burial place, and adjective respectively (the two latter pro- 
nunciations being identical). 

One further exception to the general rule that entries consist only of headword and 
pronunciation applies in the case of entries for place names and other proper nouns, 
where country of origin is sometimes specified to remove ambiguity, or to account 
for an unusual, perhaps non-naturalized, pronunciation. This is done relatively spar- 
ingly in the three English titles, but in Duden’s Ausspracheworterbuch (sixth edition, 
Mangold 2005), which lists many entries from languages other than German, it is quite 
widespread. 

Modern pronouncing dictionaries which have gone into several editions, such 
as EPD and LPP, seek to add value and interest to each successive version. Each new 
edition is freshened and made more appealing by additions to and expansion of the 
format. Such expansions include the inclusion of guest essays on themes relating to 
English pronunciation in the front matter (EPD), discursive panels within the main 
text (LPD), charts showing the results of polls on the pronunciation of controversial 
items (LPD), and increasingly detailed guides to the pronunciation of sounds at the 
head of each letter section (both EPD and LPD). In a similar way, the sixth edition of 
the Ausspracheworterbuch has well over 100 pages of front matter including charts, dia- 
grams, discussion of phonetic and pedagogic principles, and language-specific guides 
to the native-like pronunciation of German (aimed at learners of the language) and of 
numerous other languages (for all users). 


17.5 SPECIALIST PRONOUNCING 
DICTIONARIES 


This section deals with pronouncing dictionaries which offer a more limited or spe- 
cialized wordlist. Some specialist pronouncing dictionaries focus on pronunciations 
associated with a particular subject area, and others focus on pronunciations aimed 
at particular users. Besides their obvious appeal to enthusiasts and specialists in the 
relevant fields, these are also of interest to general users who feel that to mispronounce 
a key word when discussing a particular topic might undermine their authority, or 
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suggest a wider ignorance of the subject in question. Specialist pronouncing dic- 
tionaries cater for such subjects as botany, for example Coombes’ 1985 Collingridge 
Dictionary of Plant Names, and music, for example Clarke's 1898 Pronouncing 
Dictionary of Musical Terms and many others which followed. Certain titles are spe- 
cialist both in their subject area and intended readership; for example, Fradkin’s 1996 
The Well-Tempered Announcer is a music pronouncing dictionary aimed specifically 
at classical music radio presenters. 

Among the earliest publications of this kind were the booklets produced by the BBC 
Advisory Committee on Spoken English, entitled Broadcast English: Recommendations 
to Announcers published between 1928 and 1939. These booklets instructed broadcasters 
in pronunciation, following considerable discussion and deliberation by the advisory 
committee. The first volume dealt with ‘Certain Words of Doubtful Pronunciation, with 
subsequent editions focusing respectively on English, Scottish, Welsh, Northern-Irish, 
and Foreign Place-Names, and finally ‘Some British Family Names and Titles, before the 
advent of the war brought the enterprise, and the Advisory Committee, to a halt. 

The BBC subsequently published a Pronouncing Dictionary of British Names (Miller 
1971) advising on the proper names of British people and places. The pronunciations in 
this dictionary were drawn from the BBC Pronunciation Unit's own database, originally 
researched to advise broadcasters and built on the foundations of the earlier Broadcast 
English booklets. The dictionary found a wider readership, and ran to a second edition 
(Pointon 1983). More recently, the BBC Pronunciation Unit produced a small pronounc- 
ing volume, the Oxford BBC Guide to Pronunciation (Olausson and Sangster 2006, 
henceforth OBGP) which, although it ranges more widely in theme, remains a some- 
what specialist dictionary, as the editors describe: 


The balance is shifted... towards proper names and encyclopaedic entries. The pro- 
nunciations we have chosen include those for famous people, capital cities .. . food 
and drink, scientific terms, drugs and diseases, musical instruments, composers and 
their works, and characters from literature and myth. ... We decided it best to adopt 
a magpie-like approach, informed by the usual eclecticism of a day’s work in the 
Pronunciation Unit, where we regularly flit from news stories to music to drama to 
quizzes or science programmes. (OBGP: ix) 


OBGP was not the only pronouncing dictionary of a more general nature produced 
by broadcasting corporations. NBC’s own NBC Handbook of Pronunciation (1943; 
fourth edition Ehrlich and Hand 1991) continues to be used by American broadcast- 
ers, and is described in Section 17.2 above. Voice of America (VOA), a government- 
funded international radio station, maintains an extensive glossary of pronunciations 
in American English, available online at <http://names.voa.gov>, aimed primarily at 
VOAs own broadcasters, but ‘made public to help other organizations, schools, and 
universities and people such as politicians and corporate executives. Often, it seems, 
material initially prepared with broadcasters’ own staff in mind ends up reaching a 
wider audience. 
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17.6 SELECTION OF ENTRIES 


The set of words and phrases given in a pronouncing dictionary is likely to differ from 
that in a general dictionary, since editors are inclined to feel that it is important to include 
a higher proportion of entries whose pronunciation is not predictable. Rare or unusual 
words might be included because of their difficulty or exotic nature, rather than omit- 
ted because of their infrequency or obscurity, especially in dictionaries aimed at native 
speakers, Loanwords and phrases borrowed from other languages feature prominently. 

Pronouncing dictionaries typically include more place names, proper names, and 
trade names than general dictionaries do. Pronunciations of these can be unpredict- 
able, or can vary according to specific details. For instance, Gillingham in Kent is pro- 
nounced with an initial ‘soft g’ /d3/, but the village of the same name in Dorset begins 
with a ‘hard g’ /g/; there are various river Stours in England, ‘pronounced differently in 
different locations, and with some variation even when used in reference to the same 
river’ (OBGP: 370). Many people's names can be pronounced in more than one way and, 
although the individual possessors of a name have the right to specify how it should be 
pronounced in their own case, it is useful to list the commonest options. Names with 
their origins in foreign or ancient languages are also usually well represented. Again, 
they merit inclusion because of the likelihood that their pronunciations will be unusual 
or unexpected, or that they will be a matter of debate. 

Pronouncing dictionaries, especially those whose readership is expected to include 
speakers of the language in question as a second or foreign language, typically give the 
pronunciation for inflected and derived forms in addition to the uninflected forms. In 
the English language titles looked at in this chapter, EPD, LPD, and ODP, this involves 
giving pronunciation information about plurals, comparative and superlative forms of 
adjectives, and -s, -ing , and -ed forms of verbs, usually within a headword. Inflections 
are given in all cases, rather than just those which are unpredictable or irregular. The 
three titles vary in the degree to which they include derivations (-er, -ly, -able, -ness, etc.) 
within a headword rather than as separate entries. 


17.7 SELECTION OF PRONUNCIATION(S) 


In the early English pronouncing dictionaries of the eighteenth century, one single pro- 
nunciation deemed to be correct was given precedence. ‘Where Words are subject to 
different Pronunciations, the Reasons for each are at large displayed, and the preferable 
Pronunciation is pointed out, as Walker writes on his title page. On historical practice 
see further Pointon (this volume). In a similar way, general dictionaries of the present 
day might settle for just giving one pronunciation from among several possible alter- 
natives, and this approach is also often taken by smaller pronunciation glossaries or 
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guides, for reasons of space or simplicity. OBGP, for example, sets out its stall by declar- 
ing that only one pronunciation deemed to be correct will be supplied. 

However, where variant pronunciations exist, contemporary pronouncing diction- 
aries generally list all possibilities. In LPD, where multiple pronunciations are given, 
the first is recommended, with the pronunciations that follow it also being considered 
acceptable (or even unacceptable in the case of occasional words such as grievous or 
nuclear, which have a widely-used but deprecated variant, marked here with a warning 
sign). This ordered recommendation is based on substantial preference data from opin- 
ion polls soliciting responses from native speakers in Britain and the United States, lat- 
terly online (xviii). EPD remarks only that the first pronunciation given is ‘believed to be 
the most usual’ (xiii), whereas ODP explicitly states that ‘the ordering of variant pronun- 
ciations does not imply that one form is more desirable or “correct” than the other’ (x). 

Editors of pronouncing dictionaries often seek to distance themselves from a pre- 
scriptive position, not wishing to be regarded as self-appointed arbiters of correctness. 
‘It is very important to remember that in this dictionary we are not trying to tell you 
how English ought to be pronounced; we are presenting how we believe some native 
speakers of English actually do pronounce the words’ (EPD: vi). The process of finding 
this out generally involves research using other reference works, corpus data, polls, or 
consultation where possible, with responsible editors recognizing the subjective nature 
of their own linguistic experience and not relying on impressionistic personal perspec- 
tives. Which is not to say that editorial intuition plays no part. For some representative 
comments by editors of dictionaries compare: 


Ultimately, however, the decisions about which pronunciation to recommend, 
which pronunciations have dropped out of use, and so on, have been based on 
our intuitions as professional phoneticians and observers of the pronunciation of 
English ... over many years. (EPD: vi) 

Like all other compilers of such works I have relied primarily on general obser- 
vations of as many different speakers as possible. Present-day television and radio 
broadcasting have facilitated this as never before. (Windsor Lewis 1972: xiv) 


The sort of words which would be likely to have more than one pronunciation given 
would be words like scone, controversy, schedule, cervical, where variants co-exist and 
have their basis in regional difference or diachronic language change. When it comes to 
more fine-grained variation in pronunciation, much use can be made of abbreviatory 
conventions such as italicization, superscription, or bracketing of symbols to indicate 
optionality. This allows a range of variants to be codified with little or no extra space 
being used up. Wells (2005) offers an analysis of such abbreviation as applied in EPD, 
LPD, and ODP. 

Loanwords and loan-phrases merit separate consideration from the native vocabu- 
lary stock, because of the questions of naturalization which arise. To what degree should 
loanwords be adapted to an English pronunciation model? A wholly foreign pronun- 
ciation with sounds which do not occur in English or which might sound affected in 
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general use would probably not be helpful, but nor, perhaps, would an entirely angli- 
cized pronunciation, such as might be arrived at by an uninformed guess based on the 
spelling. In such cases, editors must balance the need to be descriptive and prescriptive; 
to reflect actual contemporary pronunciation while providing users with the authority 
and accuracy they seek. The solution can be to supply a range of pronunciation options. 
This may include LPD’s practice of offering an original-language phonetic transcription 
for loanwords. There is considerable scope for variation in such cases. 

The word chorizo provides an example. EPD’s British pronunciation for chorizo 
suggests /tfo'ri:zou/, /tfo-/, /-sou/, and notes that “A common mispronunciation is 
/tfa'ritseu/” LPD also gives /tfe'ri:zau/, /tfo-/, omits the /-sau/ version, and suggests 
the /‘ritseu/ pronunciation without any depreciative comment or warning mark. (The 
pronunciation with /ts/ perhaps has its basis in some speakers misapplying their knowl- 
edge of the sound a <z> makes in Italian loanwords (pizza, mozzarella) to this word, 
unaware of its Spanish origin or the difference that this makes to how a <z> would be 
pronounced.) Neither proposes a naturalized version with /-69u/, although that would be 
closer to standard Castilian Spanish and uses no sound unavailable in the English phono- 
logical set. 

One word-type which features prominently in pronouncing dictionaries is the neolo- 
gism. Pronunciation of newly-coined words can fluctuate until the word beds in and 
becomes codified (if that is its destiny). Even when those who create a new word express 
a preference for how they would wish it to be pronounced, speakers of the language can 
have other ideas, and pronunciation lexicographers will take actual usage into account 
alongside such prescriptions. The word gif provides a recent example of this; developer 
Steve Wilhite argued in 2013 that his preference for a soft g ought to be the final word on 
the matter; lexicographers begged to differ: 


Both pronunciations are in use. As we explained when GIF was selected as Oxford 
Dictionaries USA Word of the Year 2012, ‘GIF may be pronounced with either a soft g 
(as in giant) or a hard g (as in graphic). The programmers who developed the format 
preferred a pronunciation with a soft g (in homage to the commercial tagline of the 
peanut butter brand Jif, they supposedly quipped “choosy developers choose GIF”). 
However, the pronunciation with a hard g is now very widespread and readily under- 
stood A coiner effectively loses control of a word once it’s out there; for instance, the 
coiner of quark in the physics sense had intended it to rhyme with cork, but general 
usage has resulted in it rhyming with mark. 


(John Simpson, Chief Editor of the OED (OUP press release)) 


17.8 SELECTION OF ACCENT MODEL 


Modern pronouncing dictionaries generally include, in their introductory matter, a dis- 
cussion and specification of the standard accent or accents which they set out to reflect. 
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Again, there is a lexicographical balance to be struck. One the one hand, many users of 
such dictionaries wish for a standard, conservative form to be reflected. On the other 
hand, an accurate description of contemporary pronunciation, and an awareness of 
variation and change in the language in question, is essential for the pronouncing dic- 
tionary to have relevance and value. For a detailed discussion of recent developments in 
English pronunciation models, see Dziubalska-Kotaczyk and Przedlacka (2005). 

Modern English pronouncing dictionaries aim for a model that is as inclusive as pos- 
sible: ‘broadly based and accessible’ (EPD: xii), ‘not regionally centred or redolent of 
class... judged to be unexceptionable by native speakers’ (ODP: xii). In order to achieve 
this inclusivity, editors must reach decisions about which variants to include within 
their standard model. Examples of such cases are presented in Figure 17.1. 

Editors of early pronouncing dictionaries had different ideas, and adopted an 
unapologetically prescriptive line when it came to describing their model. Walker’s 
dictionary was subtitled ‘Rules to be observed by the Natives of Scotland, Ireland, and 
London, for avoiding their respective Peculiarities’; instruction in standard pronuncia- 
tion was the book’s very purpose. Daniel Jones, in his seminal 1917 English Pronouncing 
Dictionary, described his model as reflecting the accent ‘most usually heard in every- 
day speech in the families of Southern English persons whose menfolk have been edu- 
cated at the great public boarding-schools’; he initially referred to this as Public School 
Pronunciation before settling on the term Received Pronunciation (RP). In his 14th edi- 
tion of Jones’ EPD (1977), Gimson called the term ‘hardly tenable today’ although he 
continued to use it for convenience’s sake. 

In its most recent edition, EPD refers instead to ‘BBC English, which it calls ‘the 
accent most often heard in the speech of newsreaders and announcers on serious chan- 
nels of the BBC’ (vi), although the problematic term RP is still mentioned and discussed, 
here and elsewhere. EPD is not alone among more recent titles in referring to broad- 
casting when defining their model, although it is arguable whether this is much more 


example possible comments 
headword transcriptions 


DUNE djurn The former is somewhat conservative and marked, but still in use. 
dguin The latter is more widely heard. 

GRASS grass The former is the southern standard; ODP also includes the latter, 
gras midland and northern English form throughout. 


PURE Variation in vowel quality. The former is perceived as more 
conservative within a southern standard, but also features in 
other varieties of British English. 

Variation in rhoticity. Most English pronouncing dictionaries do 
not reflect rhoticity in their standard British English model, 
although OBGP does by means of an optional (r). 


FIGURE 17.1 Examples of variations in pronunciations of standard British English 
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than a relabelling exercise, and a rather circular one. Whatever name they choose to 
give their standard accent model, and however broad their approach, pronunciation 
lexicographers cannot entirely avoid recommending a variety by the very process of 
codification. 

Duden’s Ausspracheworterbuch’s model underwent an interesting shift, revealed by 
a comparison of Mangold’s editorial forewords. In 1962, the work deliberately dealt 
only with Hochlautung (‘high pronunciation), excluding everything associated with 
dialect and slang (Alles Mundartliche und Umgangssprachliche) and aligning itself, to 
some degree, with the highly formal and somewhat rarified performance model known 
as Biihnenaussprache (‘stage/platform pronunciation). In the 1974 second edition, 
Mangold stepped firmly away from such formality in favour of a more realistic, gen- 
eral, supra-regional model which he named Standardaussprache (‘standard pronuncia- 
tion’). He commends this model as suitable for all general speech situations, all parts of 
the German-speaking world, and with interlocutors from all walks of life, and adds that 
learners of German as a foreign language will also greet this approach as being closer to 
reality. As might be expected, this position has been maintained in subsequent editions 
of the dictionary. 

Regionally restricted non-standard accents are not usually reflected in the main text 
of pronouncing dictionaries in any language, although they may be discussed in the 
accompanying matter. Editors often express the sentiment that, if only space allowed, 
it would be desirable or interesting to represent a wider range of accents, but for prac- 
tical reasons they go on to give recommendations only in one standardized form. 
Non-standard pronunciations are therefore more likely to be found in regional and dia- 
lect dictionaries, for discussion of which see Upton (this volume). 

Pronunciation realizations can also vary quite considerably depending on the rate at 
which someone is speaking. Although this is not strictly a matter of accent, it merits a brief 
mention here, as it is another possible parameter of variation in pronunciation on which a 
decision must be made by the editor of a pronouncing dictionary. In practical terms, mod- 
ern pronunciation lexicographers agree with Passy’s classic recommendation that the best 
rate for the purposes of instruction was ‘prononciation familiére ralentie’ (Passy 1906: 39), 
usually translated as ‘slow conversational style’; neither too rapid nor too laboured. 


17.9 TRANSCRIPTION 


Our focal group of English pronouncing dictionaries, LPD, ODP, and EPD, use the symbols 
of the International Phonetic Association's alphabet (generally known as IPA) for transcrip- 
tion. Many other transcription systems are in use; American pronouncing dictionaries tend 
to prefer respelling, and various language-specific phonetic systems have been developed 
and continue to be used instead of IPA. See Section 17.2 above for discussion of transcription 
in a historical context; Kemp (2006) provides an interesting overview of the early history 
of phonetic transcription in general. Pointon (this volume) discusses the representation 
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of pronunciation in general dictionaries, and provides useful illustrations and examples of 
transcription systems for comparison. In this section, a survey of respelling and phonetic 
transcription is offered, with a particular focus on how IPA is used in pronouncing diction- 
aries. For a more general guide to phonetic transcription and analysis, see Wells (2006). 


17.9.1 Respelling 


Full transcription requires the elaboration of a system which matches symbols to sounds 
on a one-to-one basis. A respelling system takes the normal orthography of a language as 
its starting point, adapting it only as much as is required to eliminate ambiguity in the pro- 
nunciation. For example, although the English letter <g> may be pronounced with a ‘hard’ 
or ‘soft’ sound, a respelling system would assign the symbol <g> only to the hard sound /g/, 
perhaps reserving the symbol <j> for the ‘soft g’ /d3/ sound. 

Respelt pronunciations deviate from the actual spelling of the word only as much as is 
necessary, which may give the initial impression of simplicity and accessibility. Letters pre- 
sent in spelling which are silent can be omitted, and most consonants are relatively straight- 
forward to deal with. Digraphs can be used: for example <zh > for the sound in measure (IPA 
/3/). English’s vowel system is rather more complicated and requires the use of many more 
digraphs—<igh>, <oo>, <aw>, etc.—and careful assignment and definition. A dictionary 
user seeing <ow> must know in advance whether s/he is intended to read it as in cow or as 
in low; if <y> is assigned to the consonant at the beginning of yes it should not also be used 
for the vowel sound in by or my (<igh> or <I> are the usual solutions there). Alternatively, 
diacritics such as macrons may be introduced: <4> as in made vs. <a> as in mad, <é> as 
in feed vs. <e> as in fed. See also Pointon (this volume) on the history of the use of such 
respellings. Macrons are used in the BBC’s traditional respelling, called Modified Spelling 
(see Pronouncing Dictionary of British Names, 1971), although the BBC Pronunciation Unit 
developed a parallel plain-text system which can be used when a pronunciation must be 
communicated via SMS or teleprompter, and special characters and diacritics are not prac- 
tical. A very similar system was used in OBGP.' 

Figure 17.2 sets out a typical respelling system with example words. This is the system 
used in the New Oxford American Dictionary. Note how macrons, diaereses, and breves 
are used, as well as capitalization in digraphs which use symbols which would mean 
something different in isolation. 


17.9.2 Marking Stress 


In English, stress is not phonologically predictable and must be indicated. In phonologi- 
cal terms, stress is anonsegmental feature, which means that it is not associated with just 


1 Full guides to both BBC systems can currently be downloaded from <http://www.bbc.co.uk/ 
commissioning/tv/resources/pronunciation.shtml>. 
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[symbol sain, SC —C(Csi‘CSC‘idCS 
—— =. on otha hat/, fashion /'fashanj, carry /‘karé 
Sg > hte daj, rate /rat/, maid /mad/, pre 


big) = —CSCS 
church /CHarCH/, picture /' pikCHor, 


air fe(a)r/, care /kefa)r; 
about /9'bout/, soda /'soda/, mother /' maTHar/, person /'parsan 


get /get/, exist /ig'zist/, eqq /ea 
fit /fit/, quild /gild/, women /'wimin 
time /tim/, quide /gid/, hire /hi(a}r/, s 


free /fré/, graph /graff, tough /tof, 


I 


pra 

lot /lat/, father /‘fathar/, barnyard /‘barn yard 

—— aa sk 

a ae néd/ honor /‘anar/, maiden /'madn 

| cu | mousefmous),coward/'koulad/ 
| in put foot capfkap) = Cd 
ae eee ran{, fur /far/, spirit /' spirit, 


Ss 


her /har/, behave /bi' hay, 
top /tap/, seat /sét 


¢ 
wood /w 60 d/, football /’f 60 t bol/, sure /sh Oo + 


then /THen/, father /' faTHar, 


H 
t 
H 
H 
Vv 


Ps feet (fet, receive /tsev 
Fg 


food {food/, music /’ myoozik, 


wait /wat/, quit /kwit 
w when /(h}wen/, which /(h}wiCH 


T 
I 
h 


| tH I thin Hinj, truth /trooTH 
|v never finevarlvery fives) 


zipper /' zipar/, musician /myoo ziSHan 


Q 

dog /d6q/, bed /bed 

men /men/, bet /bet/, ferry /‘feré 

measure /' meZHar/, vision /'viZHan 


sit {sit}, lesson /'lesan/, face /fas, 


shut /SHat/, social /’sOSHal/, action /’akSHan 


FIGURE 17.2 Respelling transcriptions used in New Oxford American Dictionary 
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Handbook presents the alphabet with several illustrations from particular languages, 
guidelines for using the IPA, and detailed discussion and explanation of the underly- 
ing principles. These include the assumption that ‘speech can be represented partly as a 
sequence of discrete sounds or “segments” ’ but that nonsegmental aspects such as stress 
must be represented independently, and that ‘segments can usefully be divided into two 
major categories, consonants and vowels’ which can be described ‘with reference to how 
they are produced and to their auditory characteristics (Nolan and Esling 1999: 4). 

The IPA chart reproduced in Figure 17.3 presents the entire range of core IPA sym- 
bols, which can be used to make a phonetically detailed transcription of speech in any 
language. The vowels appear in the cardinal quadrilateral which is an abstraction from 
a saggital section through the space in a speaker’s mouth. Consonants are arranged 
according to their place and manner of articulation, and whether or not they are voiced. 

Ina pronouncing dictionary for a specific language, only a small number of these IPA 
symbols would be required, and such a level of precision would be unsuitable; diction- 
ary transcriptions are phonemic rather than allophonic. In very broad summary, a pho- 
neme is held to be a distinctive sound within a particular language which, if swapped 
with another, could change the meaning of the word. Allophones are variant realizations 
of a phoneme; substituting one allophone for another within the particular language 
under consideration would not change meaning. Each of our focal group of dictionaries 
discusses allophonic variation in their introductory matter, which allows the broader, 
simpler phonemic transcriptions to be used in the main text.’ 


17.10 AUDIO: THE FUTURE OF PRONOUNCING 
DICTIONARIES? 


Respelling was a genuine reform in directness, changing tedious description and 
analysis to immediate representation of the sounds in print. Are we not ready for 
another radical change, this time from visual symbols to oral-aural pictures of the 
pronunciation through phonographic or other recordings? .. . If this road is followed 
through, we shall some day havea talking dictionary. (Emsley 1940: 59) 


Emsley’s brave new world, in which dictionaries could talk, is now truly with us. 
Certain print pronouncing dictionaries, including LPD and EPD, have been accompa- 
nied by audio CD-ROMs for some years now. An increasing number of general diction- 
aries which can be consulted online feature clickable links for each headword which 
play an audio file of the word being spoken in one or more accents. Such audio files may 
be synthesized from the transcription; more usually, they are recordings of the word 
being spoken aloud by an actor. For example, the portal at <www.oxforddictionaries. 


2 For discussion of phoneme theory in general, see Lass (1998), Ladefoged (2006), or Clark et al. 
(2007). 


THE INTERNATIONAL PHONETIC ALPHABET (revised to 2005) 
— (PULMONIC) © 2005 IPA 


a er] a (deg elas 
ew | mi mpm | al wl af wf | 


eat oe 
fv Iszl/f[3fis 2i¢ il 


¢ ilx y[x wh {ih fi 
Lateral 

= Pee ee eee 
os 
el a eS nee ee ee 
approximant 


Where symbols appear in pairs, the oae to the right represents a voiced consonant. Shaded areas denate articulations judged impossible. 


CONSONANTS (NON-PULMONIC) VOWELS 
oO 2 B  pitai > Close j Hu —— Weu 
= Y 
{Post)alveolar Close-mid e ra) —- 9*&o-——— ¥ #0 
> 
Palatoalveolar k Valar 
i Alveolar leterat | CF Uvular S? Alveolar fricative Open-mid exce—s3 ye —Ae*9d 
OTHER SYMBOLS 
Open a_\_a D 
AA Voiceless labial-velar fricative © Z@ atveolo-palatal fricatives Where et appear in pairs, the ane 
; : J ta the right represents a rounded vowel. 
W Voiced tabial-velar approximant Vaiced elveolar lateral flap 
U__ Voiced labial-paiatal approximant f Simultencous J and XK SUPRASEGMENTALS 
H._ voiceless epiglottal fricative ! Primary stress 
Affricetes and double articulatians 
F voiced epiglottal fricative saa bo eteneaied Bytwe eaboli Kp ts 1 Secondary stress | 
2 jained by a tle bar if necessary. — foune tifen 
Bpiglottal plosiva ‘ Long e! 
bd ' 7 
DIACRITICS _Diacritics may be placed above a symbol with a descender, e.g. I] Halflong = © 
. ~ 
Extra-short © 
voictiess 1 A | preatyvoired DA |, Dena t d | sizer Gop 
inor (foot) graup 
Vaiced § _ Greatywieet D @ |, pica td I] Major Gatonsin) grup 
cee OT ie 
rounded ined € 
= x a ~ Linking (absence of a break) 
<_ Lessrounded =D J patatalized ti di d 
TONES AND WORD ACCENTS 
, Advinced = U YY  Vatarized tY d¥ Lateral release LEVEL CONTOUR 
yy x E; 4 ws 
— Retracted € g Pharyngealized t d No audible release d Cou 71 igh Cor A Rising 
ya a ; 
ene ee tse & Vm 
_ i é ‘i e I - ‘ e 4 Mid ¢€ 4 rising 
Mid-centralized Raised ¢ = yaiced alveolar fricative) . Low 
= = s. @ | Low ea rising 
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FIGURE 17.3 IPA Chart 


Source: <http://www.internationaiphoneticassociation.org/content/ipa-chart>, available under a Creative Commons 
Attribution -Sharealike 3.0 Unported License. Copyright © 2005 International Phonetic Association 
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com> allows users to consult several monolingual and bilingual dictionaries, and pre- 
sents recordings of pronunciations alongside written transcriptions. Such recordings 
are easy to listen to on a PC using headphones or integrated speakers, or using a smart- 
phone or tablet. 

The drive for lexicographers to include audio pronunciations in dictionaries comes 
not only from technological advancements which make it easier to do so, but also from 
the growth in pronunciation products which put audio at their heart. Several free web- 
sites now exist which allow users to listen to recordings of words; they effectively func- 
tion as pronunciation dictionaries. In some cases (e.g. <inogolo.com>, which restricts 
itself to American English) the words are spoken by the site’s owner, and users must 
judge for themselves how accurate or authoritative the pronunciations might be. Other 
sites, most notably <forvo.com>, use a crowdsourcing model, in which users are invited 
to submit their own recordings, and rate those made by others. Multiple recordings for 
single entries are encouraged (“Could you do it better? Different accent?’) and maps 
showing the declared location of contributors are featured. As this model matures, it 
presents a significant challenge to professional lexicographers, who must persuade users 
of the authority and quality of their dictionary product in order to remain competitive. 

Could audio take the place of written transcriptions entirely, as Emsley proposed? 
Listening to a spoken pronunciation is certainly more direct; avoiding the decoding 
process necessary in reading and interpreting IPA or respelling. This could make a dic- 
tionary seem simpler or more accessible to users who are challenged by the presence of 
a transcription, although this would perhaps be more of an issue for general dictionaries 
than the specialist pronouncing dictionaries which are this chapter's focus. But it is not 
always possible or desirable for those who wish to consult a pronunciation dictionary to 
listen to sound files, however advanced their equipment and however user-friendly the 
interface may be. A user who is in a quiet place, or who has hearing difficulties, or who 
for whatever reason requires the pronunciation in written form, will continue to look 
for a transcription. If synthesis is used to produce audio files, transcription would be 
required as its input; if actors are used to record words aloud, they too require accurate 
transcriptions to read. A recording made by a speaker will also, of course, reflect that 
individual’s particular speech, whereas a phonemic transcription is a broader abstrac- 
tion which, properly interpreted, allows multiple potential variations in accent and idi- 
olect to be embraced. “Talking dictionaries’ are now a reality, but audio complements 
rather than supplants transcription and should, in the case of pronouncing dictionaries 
at least, continue to do so. 


CHAPTER 18 
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FRANZISKA BUCHMANN 


18.1 INTRODUCTION 


THE existence of spelling dictionaries is dosely connected with orthography. Orthography 
prescribes how to spell a certain word correctly, whereas graphematics describes the writ- 
ing system and its internal regularities. Comparing German with English it becomes clear 
that a historically developed writing system does not necessarily lead to an explicitly ortho- 
graphic system governed by rules. German is one of the (written) languages that have an 
authorized orthographic rule system (cf. Nerius 1990, who analyses spelling dictionaries 
of European written languages). As such, it provides a suitable and well-researched focus 
for an investigation of the distinctions between graphematics and orthography, on the one 
hand, and the function of spelling dictionaries, on the other. 

In this chapter the relation between orthography and graphematics will be analysed 
first, and secondly the macro- and microstructure of the most popular spelling diction- 
aries in German, the ‘Rechtschreib-DupEn’ and the ‘Rechtschreib-WaHRIG, will be 
examined. After this, their function as dictionaries will be examined. Spelling diction- 
aries contain an alphabetically ordered wordlist showing the correct spelling of these 
words, This wordlist constitutes one part of the doubly-coded orthography in German. 
Lastly, the self-perception of the Rechtschreib- DuDEN will be explored. Nowadays the 
DupEN is the best-selling dictionary in Germany and it has huge authority. To under- 
stand this outstanding position, one has to focus on the history of the DUDEN. 


18.2 THE RELATION BETWEEN 
ORTHOGRAPHY AND GRAPHEMATICS 


German linguists began to discuss the relation between orthography and graphemat- 
ics at the very beginning of research on the writing system. According to Eisenberg 
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(1983: 41) the terms orthography and writing system/graphematics do not mean the same; 
orthography is associated with standard language, unity, and accuracy; the writing sys- 
tem instead is associated with the structure of written units and cannot be separated 
from the language system. The writing system consists of the entirety of the internal 
regularities of the written language. The use of writing constitutes the empirical basis 
of graphematical analysis. That means that graphematics describes how people actually 
write. A graphematical analysis has to explore which regularities evoke a certain spell- 
ing. For that reason descriptive graphematics will not exclude orthographic violations 
from the data set which is used in the study (Eisenberg 2006: 303). Graphematics as 
a natural system of language displays the same natural language change as phonology, 
morphology, syntax, or semantics (Niibling 2008: 208ff.). In fact one can also assume 
functional concepts for explaining written language change. 

Graphematics as a part of grammar is connected with all other grammatical disci- 
plines. That means the spelling of a word reveals its grammatical structure. As the first 
step in determining the (correct) spelling of a certain word, one has to decide whether 
or not this unit is a potential word. Subsequently the phonological, morphological, 
and syntactic status of the unit needs to be determined (cf. e.g, Giinther 1997; Jacobs 
2005; Fuhrhop 2007, who analyse the grammatical foundation of compound and 
syntagma spelling). One assumes spelling principles at different grammatical levels 
(cf. e.g. Eisenberg 2006: 301ff.; Nerius 2007: 99ff; Fuhrhop 2009). The phonographic 
spelling principle describes the regular phoneme-grapheme correspondences and 
analyses the influence of the syllabic position on the corresponding grapheme. One can 
thus describe the basic units of the writing system and how they are combined to form 
more complex units such as graphematic syllables and words (cf. Fuhrhop 2008). The 
syllabic principles comprise, for example, the spelling of a tensed vowel: See, Reh, deh- 
nen, leben (‘lake, deer, to stretch, to live’) all have along tensed vowel fe:], but the writing 
system has established different possibilities for representing this. The morphologi- 
cal spelling principle deals among other things with the morpheme identity between 
words belonging to the same inflectional (and derivational) paradigm. On the border- 
line between morphological and syntactic spelling principles, one can assume that one 
morphological and syntactic word is written as one graphematic word. If two adjacent 
units are involved in one morphological word formation process and these two units 
fulfil only one syntactic function within a sentence, one has to write these units as one 
graphematic word. In the following I will call this compound spelling, even though the 
written unit need not be a morphological compound. If two adjacent units represent 
two syntactic words and two different syntactic functions within a sentence, these units 
are written separately as two graphematic words. I will call this syntagma spelling (see 
Fuhrhop 2007; Fuhrhop and Peters 2013: 257ff.). Furthermore in German we have a syn- 
tactically motivated capitalization of certain words determining the core of the noun 
phrase; I will call this capitalization spelling (cf. Bredel 2006, 2010; Giinther 2007). The 
opposite of capitalization spelling I will call lower-case spelling. Summing up, the spell- 
ing principles clarify that the German writing system is not a simple phonographic one 
(cf. Enderle 2005, who summarizes the criteria of the written language’s autonomy; also 
Dirscheid 2006: 23ff., who briefly summarizes the relation between written and spoken 
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language, and graphematics and phonology). This relation is characterized by a com- 
parative depth. Compared to the German writing system, the Spanish writing system 
is more phonographic and the French writing system is more morphologically deter- 
mined (Meisenburg 1999). Besides this (broad) definition of graphematics, a narrow 
definition is discussed (Neef 2004, 2005, 2007). In this case only the phoneme-graph- 
eme correspondences are part of graphematics, all the other spelling principles are clas- 
sified as orthographic principles belonging to a so-called systematic orthography. The 
authorized orthographic rules constitute another part of the system. 

The definition of the term orthography is well established. Orthography is character- 
ized by the following four criteria: codification, its binding character, a strong stabil- 
ity, and little variability in terms of the existence of alternative spellings for one lexeme 
(cf. Nerius 2007: 34ff.). The orthographic rules prescribe the external standardization of 
writing; they deal with the question of how to spell a certain word correctly. The ortho- 
graphic rules are accepted at a certain time by a certain society; they are obligatory for 
school teaching and public authorities or administrations. The orthographic rules are 
codified and binding. In 1901, at the so-called Second Orthographic Conference (Zweite 
Orthographische Konferenz), the orthographic rules for German were enacted, and if 
one explores this process, it becomes obvious that not only linguistic considerations 
played a role, but also socio-political, socio-historical, and economic ones (cf. Nerius 
2007: 342ff.). Changing the orthographic rules is a special act which can be compared 
to the act of changing a law, and this implies a conservative, prescriptive character for 
orthography. But I would point out that in the core aspects of the writing system, the 
orthographic rules are based on graphematic principles. Only in the periphery do 
graphematic analyses and orthographic rules not lead to the same spellings. One can 
observe this on the periphery of capitalization spelling or compound and syntagma 
spelling, for example: 


(1) imallgemeinen—im Allgemeinen 
‘in general’ 


Example (1) was often cited during the orthographic reform of 1996. Before 1996 it was 
written in lower-case letters (DUDEN 1991: 32, R 65). After 1996, the orthographic reform 
requires the second word to be capitalized (Amtliches Regelwerk 2006; §57(1)). The cap- 
italization spelling identifies the core of a noun phrase, but the word allgemeinen does 
not show the grammatical behaviour of such a unit. One cannot add an adjective to the 
potential core unit (see (2)), which is one criterion among others characterizing the core 
ofa noun phrase (see Fuhrhop and Peters 2013: 269-75). 


(2) a. *imschénen Allgemeinen 
in-the nice general 
b. im schénen Wald 
in-the nice forest 
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The orthographically ruled capitalization spelling of im Allgemeinen thus violates the 
graphematic analysis. 

German orthography is doubly coded (cf. Kohrt 1990). On the one hand, there are 
the legalized prescriptive spelling rules in the form of a text. They are published on the 
web page of the Rat fiir deutsche Rechtschreibung (<www.rechtschreibrat.com>). On the 
other hand, there is a wordlist which consists of examples for these rules, for example 
the wordlist on the web page of the Rat fiir deutsche Rechtschreibung or the wordlist in 
the spelling dictionaries (cf. Lasselsberger 2000, who examines the codification of the 
orthography in spelling dictionaries). Coping with orthography is a highly valued skill. 
If one does not want to attract negative attention, one has to follow the orthographic 
rules, for example when completing application documents. Proficiency in orthography 
has high social prestige and is considered a mark of education. This is probably true not 
only for German but also for every other society with a high literacy rate. 

Recent history and developments in German orthography can be illustrated by an 
examination of the compound and syntagma spellings listed in example (3). As mentioned, 
German has had an authorized and standardized orthography since 1901. In fact, the ortho- 
graphic rules did not change between 1901 and the first orthographic reform in 1996 (cf. 
Nerius 2007: 287ff., 375ff., who summarizes the historical development of the orthographic 
rule system). The first reform was all-encompassing; in particular, the changes in the sys- 
tems of punctuation marks, compound and syntagma spelling, and upper- and lower-case 
spelling ran contrary to writers’ intuitions. To clarify this, let us look at the following exam- 
ples for the system of compound and syntagma spelling (cf. Fuhrhop 2007: 31ff.): 


(3) a. Auto fahren, eislaufen, brustschwimmen 
‘drive a car’ ‘skate’, ‘swim (breaststroke)’ 
b. Auto fahren, Eis laufen, Brust schwimmen 
Auto fahren, eislaufen, brustschwimmen/Brust schwimmen 


Example (3a) shows the spelling before 1996. These combinations of a unit, which looks 
like a noun (here: Auto ‘car, Eis “ice, and Brust ‘breast’), and a verb (here: fahren ‘drive, 
ride, laufen ‘walk, go> and schwimmen ‘swim’) were written differently. And this corre- 
lated with their grammatical behaviour. Auto ‘car’ has noun characteristics, but eis- ‘ice’ 
and brust- ‘breast’ do not. The sentence in (4a) is grammatically correct. However, the 
sentences in (4b, 4c) are incorrect (cf. Fuhrhop 2007 for the criteria whether or not to 
write two units together in one graphemic word). 


(4) a. Ich fahre ein schénes Auto. 
I drive a nice car 
b. *Ichlaufe ein schones Eis. 
I skate a nice ice 
c. *Ichschwimme eine schéne Brust. 
IT swim a nice breast 
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In contrast, (3b) exemplifies the spelling after 1996. The reform postulated the syntagma 
spelling as the default case. Writing the units separately in two words involved a re- 
analysis of the non-verbal units as nouns, leading to the capitalization spelling, which 
was not consistent with writers’ intuitions and with the unit’s grammatical behaviour. 
The reform was discussed passionately among laymen, teachers, pupils, and linguists, 
and resulted in a new reform in 2006, which basically addressed the controversial and 
disputed questions of the first reform. One can see the resulting spellings in (3c). The 
answer to the problem was the reintroduction of alternative spellings, which is a useful 
solution from a graphematic perspective. When in doubt about the grammatical behav- 
iour of units, why (should one) exclude alternative spellings from orthography? This 
rather simplified demonstration of changes to and development of the German ortho- 
graphic reforms is a useful example to aid understanding of the sometimes passionate 
discussion around the orthographic rules. 

To sum up, one can assume that there is a well-defined relation between (the broad 
definition of) graphematics and orthography. Graphematics contains the internal writ- 
ing system. In contrast, orthography prescribes the external standardized spelling rules. 
Both systems should be analysed separately, but both systems together constitute the 
German writing system. Graphematic regularities and orthographic rules correlate at 
the core of native writings, but not at the periphery. 


18.3 THE MACROSTRUCTURE 
OF GERMAN SPELLING DICTIONARIES 
(DUDEN AND WAHRIG) 


This section describes the macrostructure of the widely accepted spelling dictionaries 
of German: the DuDEN and the BRocKHAUS-WauHRIG. I shall give an account of the 
DupeEn’s history in the context of its self-perception. For the BROCKHAUS- WAHRIG 
(2011) dictionary one has to know that it continues the BERTELSMANN (1996) and WAH- 
RIG (2005, 2006) spelling dictionaries.! 

The DupEN and the BROCKHAUS-WAHRIG dictionaries have nearly the same 
number of keywords. The DUDEN (2009) contains about 135,000 keywords and 
the BROCKHAUS-WAHRIG (2011) 140,000. In both dictionaries the alphabetical 
wordlist accounts for the central part of the book; although the coverage of other 


1 From 1984 to 2008 Brockhaus, which edited waAHRiG, belonged to the same publishing company 
as the DUDEN. In 2008 waHRiG switched to the subsidiary company of Bertelsmann. In WAHRIG (2005, 
2006) one can see that the BERTELSMANN (1996), which is out of print, and waHRIG are connected 
to each other: they both have the same macrostructure (author, content, structure, the attending text 
about the history of German orthography). Lately the wanrtc spelling dictionary has been published as 
BROCKHAUS-WAHRIG (2011). 
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topics varies a lot in previous editions, this is not the case for the DUDEN (2009) and 
BROCKHAUS- WAHRIG (2011). 

The DuDEN (asth edition, 2009) is visually divided into two parts: the first is high- 
lighted in yellow; the second is highlighted in white. The first part consists of the fol- 
lowing subchapters: the common apparatus for using the spelling dictionary, a list 
of abbreviations, and the most important grammatical terms used in the dictionary 
(DUDEN 2009: 9ff.). The next subchapter deals with orthographic rules. These are 
based on the authorized rules, but they also contain interpretations by the editorial 
office (DUDEN 2009: 25). They are listed in alphabetically ordered keywords. This order 
of keywords is exactly the order of the orthographic rules in DuDEN before the first 
reform of orthography in 1996 (DUDEN 1991). In the following, I will call these rules the 
corporate orthography (DUDEN 2009: 25ff.). The first part of the dictionary continues 
with a style guide for written and electronic texts plus some advice for writing business 
letters and e-mails. Then a list of correction marks, the Greek and Russian alphabets, 
and a wordlist contrasting spellings according to the old and the new orthographic 
rules follow (2009: 99ff.). Subsequently some of the changed orthographic rules since 
1996 are illustrated. The editorial office calls these the most important changes, for 
example compound and syntagma spelling and capitalization spelling (2009: 141ff.). 
In connection with this, a note is given as to which press agencies uses the author- 
ized spelling rules (2009: 148). A short summary of the history of the orthographic 
rules in tabular form and some details about German words, for example the most fre- 
quent words in German, conclude part one (2009: 149ff.). The second and last part of 
the dictionary—highlighted in white—is the alphabetically ordered wordlist, which is 
the main part of the dictionary (2009: 167ff.). In contrast to previous editions of the 
Dupen, a third part of the dictionary was omitted in the 25th edition. Up to and includ- 
ing the 24th edition (DUDEN 2006), the third part, highlighted in grey, comprised the 
authorized spelling rules according to the second orthographic reform of 2006. Other 
differences between the 24th and the 25th edition concern the order of the subchapters 
within the yellow highlighted (first) part. 

The BRocKHAUS- WAHRIG (2011) differs only in two points from the DuDEN. It starts 
with common details on using this spelling dictionary. A list of the abbreviations and 
IPA-symbols that are used in the dictionary follows. The Greek, Russian, and Hebrew 
alphabets and a list of correction marks follow. After that there is a style guide for written 
and electronic texts, plus some advice for writing business letters or e-mails, and advice 
on how to write an application for an employment plus curriculum vitae. These are nearly 
the same points the DuDEN has. After that the authorized orthographic rules follow; here 
one can see a huge difference between the DUDEN and the BRocKHAUs- WAuRIG. The 
BROCKHAUS- WAHRIG (2011: 33ff.) gives the authorized orthographic rules primacy by 
placing them before the alphabetic word list. The DUDEN (2009) provides its corporate 
orthographic rules charged with its own advice and interpretations, which illustrates the 
remains of its supremacy before 1996 when the DuDEN was the orthographic author- 
ity in cases of doubt (see Section 18.6). After that the BRocKHAUS-WAHRIG (2011: 86ff.) 
lists some of the new orthographic rules at a glance. And here one can see the second 
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Table 18.1 Recommendations of a certain spelling in Dupen (2009) 
and Brockyaus-Wanrlc (2011) 


Brockhaus-Wahrig's 


Authorized orthographic spelling Duden's recommendation recommendation 

auGerstande, auBer Stande ‘incapable, auBerstande auBerstande 
unable’ 

dickmachen, dick machen ‘thicken’ dick machen dickmachen 


difference compared to DuUDEN. The BRocKHAUS-WAHRIG editorial office does not claim 
to make any interpretations of the legalized orthographic rules, at least not in this chap- 
ter. This does not, however, exclude the possibility for BRoCKHAUS-WaHRIG to indirectly 
interpret the orthographic rules in the word list, for example by recommending a certain 
spelling in cases where two alternatives are permitted (see Table 18.1). Returning to the 
macrostructure of the BROCKHAUS- WAHRIG, a short grammar follows which explains 
all the grammatical terms used in the dictionary (2011: 98ff.). The following alphabeti- 
cal wordlist makes up the main part of the dictionary (2011: 1:3ff.). In the last chapter the 
declension of pronouns, nouns, and adjectives and the conjugation of regular and irregu- 
lar verbs are listed in numbered tables (2011: 1221ff.). In former editions (WAHR1IG 2005, 
2006; BERTELSMANN 1996) information on the historical development of orthographic 
rules was provided. In the latest edition this short chapter has been omitted. 


18.4 THE MICROSTRUCTURE OF SPELLING 
DICTIONARIES (DUDEN AND WAHRIG) 


Both the DupEN and the BRocKHAUS- WAHRIG contain not only orthographic details 
for each keyword but also grammatical information. However, this is not the central 
function ofa spelling dictionary. 

As above I will start with the DUDEN (2009). Each keyword is semi-bold. Long and 
straight vertical lines indicate syllabification. New orthographic spellings after 1996 
are printed in red. This also applies to the syllabification if it was changed during the 
reform of the orthographic rules. If there are now two alternative spellings, the DUDEN 
recommends one of them which is additionally highlighted in yellow. Apart from this 
purely orthographic information, the DUDEN contains grammatical details concern- 
ing pronunciation and stress, origin and application, and inflections. A point under a 
certain vowel grapheme signals a short stressed (lax) vowel, a horizontal line under a 
certain vowel grapheme signals a long stressed (tensed) vowel. The area of use, applica- 
tion, and sometimes the meaning of a word are specified in parentheses. The keywords 
come from multifarious registers, for example dialects, different technical languages, 
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colloquial language. The pronunciation transcribed in IPA symbols, additional explan- 
atory notes to the information in parentheses, and truncations of any kind? are pro- 
vided in square brackets. The etymological origin, for example French or Latin, is given 
in angle brackets. For each noun the inflectional forms of the genitive singular and the 
nominative plural and the gender are specified. For regular verbs the DUDEN gives no 
grammatical information; for each irregular verb one can find the second person sin- 
gular simple past active indicative, the second person singular simple past active sub- 
junctive, the past participle, and the imperative singular. Other irregular inflectional 
forms are given in case of need. For adjectives, the characteristics and irregular forms 
of comparison are specified. If there are homographs with different meanings the key- 
words have a superscript character at the beginning. 

The BRocKHAUS-WaAHRIG (2011) shows a similar microstructure. Like the DUDEN, 
each keyword in the BROCKHAUS- WAHRIG is semi-bold, and the long and straight verti- 
cal lines also indicate syllabification. New orthographic spellings after 1996 are printed 
in blue. This also applies to the syllabification if it was changed during the reform of the 
orthographic rules. Additionally the keyword has a blue rhombus in front of it. If there 
are now two alternative spellings the BRocKHAUS-WAHRIG also recommends one of 
them which is additionally underlined with blue. The grammatical information is also 
nearly the same as in the DUDEN. The stress is signalled by a point under a certain vowel 
grapheme for a short stressed (lax) vowel, or a horizontal line for a long stressed (tensed) 
vowel. The pronunciation is specified in square brackets with IPA symbols in case of 
doubt. The etymological origin is also given in square brackets, for example French 
or Latin. Sometimes the meaning of a keyword is specified. The usage of a keyword is 
implicitly given with examples or phrases. For each noun the gender is specified and a 
number refers to the corresponding declension table. If a noun cannot be assigned toa 
particular declension table, one finds the genitive singular and the nominative plural 
within the word list. For each verb it is stated whether it is a transitive, intransitive, or 
reflexive verb. After that a number also refers to the corresponding conjugation table. 
For each preposition one can find the case it governs. 

Both dictionaries use blue information boxes within the wordlist for further gram- 
matical and orthographic details and further advice. Within them the DUDEN refers 
to its own orthographic rules, the BRocKHAUs-WaHRIG refers to the authorized 
orthographic rules. 

The choice of keywords in both spelling dictionaries follows similar conditions. In 
BROCKHAUS-WAHRIG the keywords are based on a corpus which contains 1.8 billion 
words. This corpus includes different genres of texts, for example national newspa- 
pers and journals (BROCKHAUS- WAHRIG 2011: 8). In DupEN the keywords are based 
on a similarly structured corpus and on a historical database collected in the DUDEN 


2 ‘Truncation of any kind’ means the omission ofa syllable or letters. The following keyword 
exemplifies this: Wissbegier{de]. Wissbegier (‘curiosity ‘thirst for knowledge’) and Wissbegierde 
(eagerness for knowledge; ‘inquisitiveness’) both exist in the German lexicon and they are synonyms. 
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publishing house (DUDEN 2009: 9f.). In this respect, both dictionaries seem to represent 
the so-called standard written language (see Eisenberg 2007). 

The dictionaries can differ in certain cases with alternative authorized spellings. 
These cases pertain to the periphery of the writing regularities. Thus it is neither surpris- 
ing that two alternative spellings exist, nor that the two dictionaries may sometimes rec- 
ommend the same but sometimes different alternative spellings (cf. DUDEN 2009: 15ff., 
BROCKHAUS-WAHRIG 2011: 10f. for the basis for the recommendations given). Table 18.1 
offers example recommendations for compound and syntagma spellings. 

In summary, one can say that both dictionaries provide more information than just 
the simple spelling of a certain unit. Whereas grammatical details and the choice of key- 
words are very similar, the recommendations of one specific spelling can follow differ- 
ent strategies. 


18.5 THE FUNCTION OF SPELLING 
DICTIONARIES 


Spelling dictionaries differ from other dictionaries due to their special function (cf. 
Nerius 1990 for European (written) languages, Nerius 2007: 351ff. for the historical 
development of German spelling dictionaries and especially for the role of the DUDEN). 
Whereas a dictionary of the German vocabulary uses the correct orthographic form 
of a written unit to ensure that the user quickly finds the written unit and its meaning 
or etymology, a spelling dictionary codifies the orthographic form. Normativity is the 
central functional feature of every spelling dictionary. But spelling dictionaries are only 
one side of standardized orthography; orthographic rules are the other side. As already 
mentioned, German features a doubly-coded orthography. Orthographic rules define 
how to spell. Spelling dictionaries by comparison apply the rules on examples. The 
selection of keywords answers this purpose. Furthermore spelling dictionaries do not 
lay claim to completeness. A spelling dictionary does not need to contain every German 
word. A closer look at the microstructures of both of the most popular spelling diction- 
aries in German shows that neither exclusively shows the orthographic form of a written 
unit. Both dictionaries provide more than exclusively orthographic information. One 
can postulate that the German spelling dictionaries enlarge their function to a diction- 
ary of German vocabulary. 

It may be useful at this point to make a brief comparison with the English writ- 
ten language. English like German has a historically developed, systematic writing 


3 Nerius (1990: 1300-1) compares the microstructure of spelling dictionaries of different European 
languages and observes a difference between German spelling dictionaries and non-German spelling 
dictionaries. Whereas the non-German dictionaries contain exclusively orthographic information, the 
German dictionaries contain additional information such as that described in this section. 
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system. But as opposed to German, elaborated spelling dictionaries such as DUDEN 
or BRockHaus-Wauric or legalised orthographic rules do not exist for either British 
or American English . The most well-known English Dictionaries for British English, 
the Oxford English Dictionary (OED), and for American English, the Merriam-Webster 
dictionaries, are dictionaries of the English language. They contain much more 
information than simply the spelling. In addition, there is The New Oxford Spelling 
Dictionary. This dictionary lists the favoured British and American spelling forms for 
over 110,000 words, including capitalization, hyphenation of compounds, and irregu- 
lar inflections. But it is not supported by an externally authorized orthography, and 
it is very little known in the general English-speaking community. In Britain and the 
United States, the OED and the Merriam-Webster dictionaries are much more promi- 
nent than spelling dictionaries. In Germany, it is the other way round. Whereas every 
layman knows the German spelling dictionaries, especially the DUDEN, no one knows 
the dictionaries of the German Language. In fact, the spelling dictionary of DUDEN is 
regarded as a dictionary of the German language by most non-specialists. 

If we investigate the function of general English dictionaries with regard to spelling, 
two ideas seem worth pursuing. First, the dictionaries indirectly show the internal regu- 
larities of the written forms by listing these forms. Secondly, they indirectly and prescrip- 
tively influence the spelling of a certain word by listing one spelling and omitting others. 
(See Mugglestone, this volume, on the difficult relation between descriptive and prescrip- 
tive dictionary- making, the presumed authority of lexicographers, and usage as the basis 
of dictionary entries.) The first idea focuses graphematic research questions, the second 
one focuses orthographic questions. Of course both ideas are connected to each other. 
For example, graphematic regularities change over the years, like any other grammatical 
system. Is the spelling of a certain form thus changeable over time, or is the etymological 
development stopped due to its entry in the dictionaries? (That the latter cannot entirely 
be the case is shown by the fact that the spellings listed can and do change from one edi- 
tion of a dictionary to another.) One could claim that the English written language has a 
singly and indirectly coded orthography, that is, the keywords in the dictionaries. 


18.6 THE DUDEN’s History, ITs SELF- 
PERCEPTION, AND ITS INFLUENCE 


This section considers the DUDEN as the most important German spelling dictionary. 
The DuDEN’s influence and prestige is based ona long history. In 1915 it was named after 
Konrad Duden, who published his spelling dictionary in 1880, applying the Prussian 
orthographic rules for schools to a large portion of the German vocabulary (cf. Nerius 
2007: 366f.). During Germany’s division after World War II, the DUDEN was also split 
into two independent editions, which were located in Mannheim (FRG) and Leipzig 
(GDR). Both editions developed differently in extent and layout (cf Nerius 2007: 372). 
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After German reunification, the first collective edition was published in 1991; the edi- 
tion’s macro- and microstructure is oriented on the last DUDEN publication from 
Mannheim. Not surprisingly, the orthographic rules had a different structure in the 
Leipzig and the Mannheim publications. The Leipzig edition had the traditional order 
of keywords according to their subject, the Mannheim DupEN changed its keyword 
order to an alphabetical order with the 18th edition in 1980 (cf. Nerius 2007: 373). This 
keyword order of the orthographic rules was maintained until the first orthographic 
reform in 1996. 

Before 1996, the DupEn’s influence was immense as the spelling dictionaries were 
elevated to the leading authority for spelling matters in both German states. In the 
FRG, the Standing Conference of the Ministers of Education and Cultural Affairs of 
the States (KMK) decided in 1955 that pending a revision of orthography, in addition 
to the spellings and rules of 1901, the spellings and rules used in the DUDEN were 
binding for school teaching and definitive in all cases of doubt (Bundesanzeiger 15 
December 1955, as cited in Nerius 2007: 373). According to Nerius (2007: 373), a cor- 
responding resolution is missing for the GDR, but ‘by appropriate regulation of the 
Ministry of People’s Education and by norms... for the printing businesses, a legiti- 
mation similar to the Duden’s was established’ (translation by the author). Even today 
the DuDEN is more accepted than the BROCKHAUS-WAHRIG (Mayr 2007: 22f,, and 
footnotes 28, 29). 

From these premises follows the DUDEN’s self-perception, which according to 
Mentrup (2007) crystallizes in three roles with specific options for action: First, the 
DupEN is the keeper of the officially sanctioned spelling. Secondly, the DUDEN is the 
extended arm of the government; it has to interpret and settle borderline cases on 
the basis of the norms or the use of language. For newly coined words or phrases, the 
rules have to be differentiated, determining the norms anew. Thirdly, the DUDEN is 
the highest language authority; it intervenes in language use in all matters of spell- 
ing and language in general (Mentrup (2007: 38f.). Accordingly, the editors of the 
DuDEN lead a double life between the conservation and the adaptation of the rules 
(cf. Bohme 2001 for the evolution of the rules), This becomes clear if rules in succes- 
sive DUDEN editions are compared—they change gradually (cf. e.g. Schaeder 1985 on 
syntagma/compound spelling and hyphen rules). Mentrup (1983), in his own pro- 
posal for a reform of punctuation, is more explicit in calling the punctuation rules 
‘the illegal daughter of state sanctioned orthography’ (1983: 6). He argues that these 
rules—except for the hyphen and the apostrophe—were discussed at neither the 
first nor the second Orthographic Conferences of 1876 and 1901 (1983: 2, 6). They 
appeared for the first time in 1915 in the ninth edition of the DUDEN and thus lack 
official legitimation (Mentrup 1983: 7). One may or may not agree with this inter- 
pretation, but one thing is certain: with the revision of orthography, the role of the 
Dupen has to be re-evaluated. 

In this regard I would like to turn to the relationship between the DuDEN’s corpo- 
rate orthography and the authorized spelling rules. In the following, I will take a closer 
look at apostrophe spellings. These spellings do not belong to the core graphematic 
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principles, but nevertheless both the spellings and the respective rules have been rather 
passionately discussed, The use of the apostrophe can be divided into a stem form apos- 
trophe and an elision apostrophe (Klein 2002). The latter is defined by the omission of 
linguistic material, see example (5). 


(5) a. <heilge> for <heilige> 
‘holy, inflected, e.g. FEM.SG., 
b. <geht’s> for <geht es> 
coll. ‘are you OK’ 


The stem form apostrophe, on the other hand, usually segments stems from inflectional 
morphemes (cf. especially Klein 2002; Bunci¢ 2004; Bredel 2008: 102ff.; Bankhardt 2010; 
Buchmann 2015: 93), see example (6). 


(6) a. <Susi’s> for <Susis> 
proper noun, inflected, prenominal genitive sc. 
b. <Nudel’n> for <Nudeln> 
‘noodles; inflected, PL. 


While elision apostrophes are orthographically licensed, stem form apostrophes are 
usually not (see (6b)). With respect to the DupEn, the following can be stated. In its 
corporate orthography (the so-called K-rules), the authorized rules are laid out with ref- 
erence to the respective paragraphs. Moreover, the reader is given further advice, This 
advice, however, is not part of the official spelling rules. For the stem form apostrophe, 
the scope of the application of the rules is even widened (consciously or unconsciously) 
by using example words that do not occur in the official rules. In K 16, 2 of the corporate 
orthography one finds the following information: 


Occasionally, the apostrophe is not used as a sign for elision but to clarify the basis of 
a personal name, as in the following cases: 


b) before genitive-s <$97E>, 

Andrea's Blumenecke [‘Andrea’s Flowercorner’] (to differentiate from the male per- 
sonal name Andreas) 

Willi’s Wiirstchenbude [“Willy’s Sausage Shack’] 


(DUDEN 2009: 4, translation by the author) 
The authorized spelling rules instead state: 


§97 An apostrophe can be written if words with omissions in spoken language are 
non-transparent in writing. 
der Kapt’n [‘the captain], mit’m Fahrrad [‘with a/the bike]... 
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E: The apostrophe as an elision marker has to be distinguished from the occa- 
sional use of this sign for clarifying the base form before the genitive suffix -s . Carlos 
Taverne [‘Carlo's tavern] 


(Official Spelling Rules 2006: 99, translation by the author) 


The rule is the same, but the examples are not. The authorized rules use—consciously 
or not--an ambiguous name as an example: Carlos. There are two male names, Carlos 
and Carlo. The apostrophe clarifies which one is meant, namely the tavern of Carlo and 
not Carlos, This distinction of two names is taken up in the K-rules of the corporate 
orthography with the example Andrea’ Blumenecke. Here, the female name Andrea is 
distinguished from the male name Andreas by the use of the apostrophe. In the second 
example, however, the DUDEN choses a male name without a corresponding possible 
name ending in -s: Willi. The authorized rules state that two names which only differ 
in the presence or absence of a final -s can be distinguished with the apostrophe if they 
occur in the position of a prenominal genitive: Carlo’s Taverne is the tavern of Carlo. The 
spelling of the tavern of Carlos is also regulated in §96(1): Carlos’ Taverne. Both names 
are unambiguously marked by the apostrophe in the position of the Saxon genitive. $96 
is obligatory, §97 is optional. The authorized spelling rules thus offer the possibility of 
double marking of names. Depending on the position of the apostrophe, it is either one 
name or the other. For the name Willi, on the other hand, there are no doubts as to which 
name it is. If one is to regard the examples as part of the rules that serve to clarify the 
formulation, then the DUDEN interferes with the official spelling rules in its corporate 
orthography because it extends the area of application for the apostrophe from ambigu- 
ous names to all names. This interpretation is not made explicit at any point. Regardless 
of this, the rule expansion is accepted within punctuation research (cf. the comments 
to the spelling reform in Gallmann and Sitta 1996; Klein 2002; Camenzind 2007; Bredel 
2008: 102ff.; Scherer 2010). Whether this rule expansion was initiated by the reform- 
ers themselves or whether the DUDEN launched it in its corporate orthography must 
remain subject to speculation. The manner of implementation of the authorized spell- 
ing rules in the DuDEN’s corporate orthography has to be interpreted as a mixed bless- 
ing. On the one hand, the DUDEN picks up the official spelling rules and exemplifies 
them. On the other hand, one can negatively interpret this practice as Mentrup does. He 
assumes that the DuDEN persists in its authority of interpretation even after the spelling 
reforms of 1996 and 2006 although the Rat fiir deutsche Rechtschreibung has the author- 
ity of interpretation (cf. Mentrup 2007: 44). 

From a graphematic perspective, maybe the deletion of the authorized spelling rules 
between the 24th and the 25th DupeEn’s edition is another point for the re-evaluation of 
its self-perception. Connected to that, a second question arises: what does the deletion 
of the officially legalised rules in the DuDEN mean for the double-coded orthography? 
First, both parts of orthography, the authorized orthographic rules and the examples 
within the wordlist, need not be reproduced in the dictionary. Both parts are published 
on the web page of the Rat fiir deutsche Rechtschreibung. Of course the Rat fiir deutsche 
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Rechtschreibung provides a different wordlist from the one DUDEN does in its diction- 
ary. Secondly, another dictionary exists in the form of the BROCKHAUS-WAHRIG, which 
contains both parts of orthography—the authorized rules and a wordlist. The user is not 
solely dependent on the DupEn, But the DuDEN is the more popular dictionary and it 
is used in schools, so it is conspicuous that the DUDEN preserves its corporate orthogra- 
phy whereas it deletes the authorized rules. 


18.7 AN ALTERNATIVE GERMAN 
SPELLING DICTIONARY, WITH 
A DIFFERENT APPROACH: DAS 
RECHTSCHREIBWORTERBUCH 

BY ICKLER (2000) 


oO OOOO ees Cee See eee ee Petree ret teeta eee trete terre ttre erect errr er lreererere rie rr revi r rer terre reste ritersrererer ere terror sree rr rer ere rerr rye 


In addition to DUDEN and Wanrie, there are other spelling dictionaries as well. In 
2000 the second edition of Ickler’s spelling dictionary—Das Rechtschreibwérterbuch— 
was published. Ickler strongly criticizes the reform of the orthographic system and 
postulates that as long as no one can propose an orthographic rule system which is 
coherent and commands common consent, one should stick to the traditional orthog- 
raphy (2000: 11). What does that mean? Ickler’s points of criticism cover various differ- 
ent issues (2000: off.). First, he criticizes the fact that the system of rules was changed 
because of shortcomings in learnability instead of for functional reasons. But learn- 
ers’ problems did not disappear after the reform. Secondly, the reformed orthographic 
rules cause grammatically incorrect spellings and violate the naturally developed 
graphematic system. For that reason Ickler judges the reformed orthography as much 
more difficult than the one before 1996. It is thus not surprising that he preserves the 
traditional orthography. His spelling dictionary contains three parts. The first part 
summarizes a sample of orthographic rules, which Ickler holds to be the principal 
ones (2000: 14ff.). The second part describes these principal orthographic rules in a 
more detailed way (2000: 31ff.). The third part contains the alphabetic wordlist which 
includes a choice of keywords (2000: 73ff.). Analysing the microstructure, it becomes 
clear that this is a just a spelling dictionary. In contrast to DUDEN and BROCKHAUS- 
WauRiIG, Ickler only gives information about the spelling itself and the separation at 
the end of a line. The latter is signalled by a long vertical line. Beyond that the reader 
only receives explicit information about the grammatical behaviour, etymology, pro- 
nunciation, and meaning of a certain unit if they are useful for identifying this form. 
One can compare this pure dictionary structure with information in English spelling 
dictionaries (see Section 18.4 for the function of German spelling dictionaries com- 
pared to English spelling dictionaries). 
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18.8 CONCLUSION 


From a German perspective, it is important to distinguish graphematics and orthog- 
raphy. Comparing the English system with the German one, it becomes obvious that a 
writing system that has developed historically does not necessarily lead to an author- 
ized orthographic rule system. German has a double-coded orthography. The author- 
ized orthographic rules constitute the first part; spelling dictionaries constitute the 
second part by listing words as examples of the orthographic rules. Thus spelling dic- 
tionaries have a very special function in German. Analysing the macrostructure of the 
most popular German spelling dictionaries has shown that there is no need to print the 
authorized orthographic rules within a spelling dictionary. In fact, the DuDEN deleted 
the authorized rules in its latest edition. Instead it comprises a corporate orthography. 
This is closely connected with its self-perception it has acquired during its long history. 
The BROCKHAUS-WAHRIG shows other characteristics: it does not establish a corporate 
orthography and it conserves the authorized orthographic rules. Analysing the micro- 
structure of these spelling dictionaries, one can observe that other grammatical infor- 
mation is given, but this is not the main function. 


CHAPTER 19 


PPP errrrrrrererrectrrerrerrrirrrerrr rect rrr irr eee 


PPT TT TITTTETITITTETEPTTTriTreeetie eters eee eee 


JULIE COLEMAN 


19.1 INTRODUCTION 


SLANG lexicography is, in almost every respect, more challenging than the lexicography 
of Standard English. Slang is typically used with greater flexibility of meaning and gram- 
mar than Standard English, but it has generally been harder to document because evi- 
dence of usage has been less readily available. For this reason, providing dates of usage 
for slang terms is particularly difficult. Moreover, the spelling of slang terms often varies 
and different forms may be preferred at different times, so even selecting an appropriate 
headword form can be problematic. Slang etymologies can also be notoriously difficult 
to pin down: theories abound but proof is often lacking. Determining the pronunciation 
of slang words is also problematic, because each user will pronounce them according 
to their own accent. Representing their pronunciation is harder still, because there is 
no appropriate neutral form: an RP transcription may produce a pronunciation that is 
comically unlikely. An additional difficulty is that slang revels in subjects likely to rouse 
strong emotion, so the slang lexicographer treads a dangerous boundary between pre- 
cise definition, obscenity, and offence. 

Although slang offers many additional challenges to the lexicographer, its socio- 
linguistic functions are such that many amateur and casual lexicographers have 
tried their hand at documenting slang. For this reason, slang dictionaries occupy an 
extremely broad spectrum in terms of scope, methodology, presentation, and quality. 
The challenges that pre-occupy one slang lexicographer may not cross the mind of 
another, Limitations that infuriate some users of a slang dictionary will be immaterial 
to many others. 

The online availability of informal written conversation offers new possibilities and 
new challenges to the slang lexicographer. After many years of making do with an unset- 
tling paucity of data, slang lexicographers must now adjust to having more informa- 
tion than they can possibly process. New modes of slang lexicography have emerged in 
recent years that could offer a digested version of this online information, but they more 
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often add to the volume of data available without contributing towards the development 
of greater clarity. 


19.2 INCLUSIONS AND EXCLUSIONS 


One of the main challenges for slang lexicography lies in defining what slang is. It is 
particularly difficult to establish the boundary between slang and colloquial language, 
and many books that are marketed as slang dictionaries include or even concentrate on 
colloquialisms. In truth, there is no definitive dividing line between the two, but fac- 
tors that might be taken into account in determining that a term is colloquial rather 
than slang include length of use, geographical spread, social distribution, and context. 
A term or phrase that could be used at a multi-generational family gathering without 
comment or incomprehension is probably colloquial rather than slang. Conversely, 
terms and phrases restricted to groups with a common interest, a shared activity, or a 
similar age are more likely to be slang than colloquial. Terms restricted to specific pro- 
fessions, social classes, or ethnic groups have also been considered slang in the past, 
but scholars now generally treat these terms as belonging to jargon, sociolects, and eth- 
nolects. Unfortunately for the slang lexicographer, non-standard lexis is extremely fluid, 
and a term might simultaneously be colloquial in one context and slang in another, 
even though it is being used with the same meaning. Equally, a term that belongs to 
one person’s ethnolect may function as slang when it is adopted by people outside that 
ethnic group. This is a particular minefield for those attempting to compile a diction- 
ary of another national variety of English, in that there is a temptation to assume that 
everything unfamiliar is slang. For example, The British Slang Dictionary (<http://www. 
coolslang.com/>) includes daft ‘stupid, foolish, hoover ‘to vacuum, /oo ‘toilet, and mum 
‘mother, none of which would be considered slang by most of their users. The Oxford 
English Dictionary (OED) labels daft as ‘Now chiefly Sc. and north’ and mum as ‘infor- 
mal. Hoover and loo are not labelled in any way in the OED, suggesting that they are 
unmarked in British English. 

Although many slang dictionaries do not provide an explicit definition of slang, 
their contents appear to suggest that, in addition to the categories of terms listed 
above, it can sometimes encompass dialect, swearing, obscenity, euphemism, catch- 
phrases, nicknames, non-standard pronunciation, anecdotes, jokes, and encyclopae- 
dic information. A dictionary of drugs slang (e.g. What About Drugs) may include 
information about the chemical composition and legal status of a drug, as well as 
about how it is taken, the usual dosage, and its effects on the user. This type of diction- 
ary fulfils a practical role that overrides normal lexicographic practice, and can only 
be evaluated with reference to its functionality. A dictionary of Texan slang (e.g. This 
Dog’ll Really Hunt) may list terms used exactly as in Standard English, semantically 
and syntactically, and differing only in their pronunciation. This type of dictionary 
plays a role in the enregisterment of a dialect, and its contents have to be understood 
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as an expression of group identity rather than as a statement about the nature of slang 
itself (Agha 2003; Coleman 2o010b). 

Since many slang dictionaries present the language of a particular group of people, 
their compilers have to decide whether or not to include slang that is also used by people 
outside the group. For example, one dictionary of military slang may exclude general 
slang used in military contexts, while another may include it.! In national slang diction- 
aries there are identifiable trends in inclusion policies. Dictionaries of non-standard 
Australian English tend to exclude American terms that are also used in Australia; while 
dictionaries of non-standard American English tend to include terms that are also used 
in Australia and elsewhere, regardless of their country of origin.” Scholarly slang dic- 
tionaries will usually provide a description of and rationale for their inclusions policy, 
but many popular dictionaries do not, and this can be misleading for dictionary users. 
For example, having found a term in a dictionary of British slang, it is no great leap to 
assume that the term is British slang: not only that it is used throughout Britain, but 
also that it is not used in other English-speaking countries. Such assumptions will often 
be incorrect, and slang Jexicographers should anticipate ways in which their intentions 
might be misinterpreted. 

Historical slang lexicography poses the additional problem of dealing with terms that 
have passed from slang into colloquial] or Standard use. In his Dictionary of Slang and 
Unconventional English (DSUE), Partridge fudged this issue by including a great deal 
of colloquial Janguage, often trying to date the movement between different registers. 
However, his coverage of Standard English terms that were once slang is Jess thorough 
than it might be. Indeed, unless contemporaries commented on a term’s status as slang 
when it was first used, it is often impossible for modern readers to reconstruct the social 
cues coded by lexical choices in written texts, which may offer the only evidence now 
available. 


19.3 DOCUMENTING SLANG USAGE 


A further difficulty facing slang lexicographers is that slang belongs more fully to the 
spoken than the written language, making it particularly hard to document using the 
traditional methods of historical lexicography. Nevertheless, many slang dictionaries, 
particularly the historical ones, have been compiled Jargely on the strength of written 
evidence. Partridge addressed this problem in the DSUE by assuming that slang terms 
had been in spoken use for several decades before they were recorded in writing, and 


1 Compare Lighter’s ‘Slang of the American Expeditionary Forces in Europe, 1917-1919’ with Brophy 
and Partridge’s Songs and Slang of the British Soldier. 

2 Compare G. A. Wilkes, A Dictionary of Australian Colloquialisms with Robert L. Chapman, New 
Dictionary of American Slang. 
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adjusted his dates accordingly (Coleman 2010a: 12-14). For example, DSUE dates taps 
‘ears’ to the period between the middle of the eighteenth and nineteenth centuries: 


taps. The ears: ?mid. C.18-mid.19. Because they tap conversation (F. & H.) 


Partridge’s sole source appears to be the entry in Farmer and Henley’s Slang and its 
Analogues (1890-1904), which reads: 


TAP, subs. ...2. (old).—In pl, = the ears... 


Partridge’s date range assumes that Farmer and Henley’s entry is the tip of an iceberg of 
underlying usage, and later dictionary-makers have made similar assumptions about 
Partridge’ entries. 

While there may be some examples of frequently used slang terms in large linguistic 
corpora, it is unlikely that many ephemeral terms or those used by narrowly defined 
groups will be registered even if the corpus includes spoken material. Slang lexicog- 
raphers have tended to seek out sources likely to contain slang, such as song lyrics or 
newspaper articles, films, books, or television programmes about relevant aspects of 
contemporary culture. This can produce a distorted view of slang, in that a lexicogra- 
pher who believes slang to be characteristically masculine, for example, might not look 
for examples of women's slang, or might not recognize it as slang when he sees it. Slang 
may be an urban phenomenon, but the slang of capital cities tends to be much bet- 
ter documented than the slang of provincia] ones: terms used by apprentice tailors in 
nineteenth-century Manchester will have been documented by dialect lexicographers, 
if at all, but terms used by apprentice tailors in nineteenth-century London certainly 
came to the attention of slang lexicographers. 

In addition to his written sources, Partridge also based entries on overheard conver- 
sations (Nichols 1967) and on letters from correspondents. For example, the seventh 
edition of DSUE (1970), the last on which Partridge worked, includes: 


rhubarb. ... Alexander McQueen, an Englishman long resident in the United States, 
writes thus (on Aug. 31, 1953): 


‘Here's a genuine one that I’ve never seen recorded anywhere; I’ve been looking for 
it for halfa century! It is the word “rhubarb” used as a theatrical term. ‘.. When afew 
actors gathered backstage and represented the “noise without” made by a mob, they 
intoned the sonorous word “rhubarb.” ... I have only met one old-time actor in the 
United States who knew about this custom; and he was from England’ 


zob. Very nice: Plumtree School, Southern Rhodesia: since ca. 1920. (A. M. Brown, 
letter of Sept. 18, 1938.) Arbitrary? 


While it is entirely probable that most of Partridge’s correspondents provided him with 
information in good faith, this practice left him open to mischievous contributions and 
to allegations that his wordlist was unreliable (e.g. Legman 1951). 
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Another danger of this model of slang lexicography is that it relies too heavily on 
self-conscious representations of slang on television and in plays, novels, films, and 
songs. Slang terms often live on in stereotype long after they have fallen from everyday 
use, and slang dictionaries can play a role in perpetuating these stereotypes. For exam- 
ple, dictionaries of ‘beatnik slang’ were used to promote television programmes, films, 
and consumer products well into the 1960s although the genuine counterculture had 
continued to develop in new directions.* 

Amateur slang lexicographers can provide a useful counterbalance to more 
scholarly dependence on written sources, in that they are often based on 
insider-knowledge of a slang-using group. For example, during the 1960s school- 
teachers in the United States began to encourage their pupils to compile and some- 
times to print dictionaries of their own slang. It would be unrealistic to expect 
coursework dictionaries to achieve professional standards of lexicography, but 
these small booklets do offer independent evidence of slang usage to supplement 
the information available elsewhere. For example, A Guide to Slang, a school slang 
dictionary of 1966, includes: 


HANG IN THERE... (hangin thar) .......... keep trying [antedating OED 1969] 
SOUIZ ZY. cai stevia aio scnnud sexevosencasseasncsivi esse ckeescdastesancrece messy [antedating OED 1969] 
SPASTIC... (spaz'tic) ... n........a person uncoordinated [antedating OED 1981] 


For these fledgling slang-lexicographers, the main challenges appear to be defining 
words clearly and labelling them accurately. Spelling Standard English words can also 
bea challenge for some amateur slang lexicographers, but it can only be a good thing for 
the history of slang that slang-users have not let a lack of training or linguistic knowl- 
edge stop them documenting contemporary usage. 

During the 1930s, scholars began to realize the importance of documenting spo- 
ken language from a sociological perspective, although sociolinguistics did not yet 
exist as a discipline. In order to overcome the difficulties of recording terms used 
only in the informal speech of closed groups, they travelled with hoboes, consorted 
with pimps, and interviewed criminals, thus achieving an academic perspective on 
the inside-information they collected (eg. Minehan 1934; Milner and Milner 1973; 
Maurer 1981). Unfortunately, collecting lexical information in this way is ethically 
dubious, labour-intensive, and potentially dangerous. In addition, academic employ- 
ment offers no guarantee of lexicographical skill, and many of these scholarly lexicog- 
raphers received their training in fields other than linguistics. Their glossaries are often 
peripheral to the central aims of the study and are generally compiled with little or no 
additional research. Like the insider-dictionaries discussed in the previous paragraph, 
these scholarly-observer dictionaries provide useful information about slang usage, but 
the standard of lexicography is often out of kilter with the academic credentials of the 


3 These include The Beat Generation Dictionary (1959), Your Vaseline Hair Tonic Flip-Talk Contest 
Booklet (c.1961), and TeenAge Slanguage Dictionary (1962). 
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compiler. If carefully and accurately compiled, the results of this type of study may pro- 
vide a detailed snapshot of the usage of a narrowly defined group, but it is impossible 
to generalize from them. Maurer, for example, compiled many lists of the slangs of dif- 
ferent criminal groups, but he did not succeed in producing the general dictionary of 
criminal slang that he had hoped to write (McDavid 1982). The sociologist’s desire to 
delineate specific usages was not compatible with the lexicographer’s need to group cita- 
tions and summarize meaning. 


19.3.1 Pinning Down Meaning 


The meaning of slang words is never fixed and they are often used with closely related 
meanings in different contexts. Lexicographers of slang have to decide whether they 
are going to group related meanings together (known as lumping) or distinguish them 
from one another (splitting). For example, the OED’s second definition for smashed is 
‘Intoxicated, drunk; under the influence of drugs; “stoned”. slang (orig. U.S.), with cita- 
tions from between 1962 and 1977. Green’ Dictionary of Slang(GDS), a historical account 
of slang around the English-speaking world, divides this into ‘very drunk’ (1834-2007) 
and ‘intoxicated with a drug, esp. cannabis or LSD’ (1961-2005), and adds the senses 
‘infatuated with’ (1888) and ‘very tired’ (1989). GDS sense divisions offer more informa- 
tion about specialized uses, but the OED’s emphasis on the mental state rather than its 
cause may be a better reflection of how the word is used in relation to drugs and alcohol. 
Sometimes it is ambiguous in individual contexts, but in other cases there is no clear dis- 
tinction between these two meanings: people may set out to get smashed without being 
particular about how they do it. The same argument might be made for stoned, for which 
the OED provides separate definitions relating to drink and drugs. 


19.3.2 Plagiarism vs. Research 


Before modern copyright law, the starting point for many slang lexicographers was to 
comb through competitor dictionaries in order to appropriate a large proportion of their 
contents. Francis Grose and John Camden Hotten are among those who have proudly 
drawn attention to their antiquarian approach.’ Other slang lexicographers have played 
down or concealed their dependence on earlier sources.° The possibility of legal reper- 
cussions appears sufficient to ensure that modern slang lexicographers do not appropri- 
ate their predecessors’ work in this way, but there is a fine line between appropriation 
and citation, and big slang dictionaries have always fed on little ones. For example, 


4 Francis Grose, A Classical Dictionary of the Vulgar Tongue (1785), John Camden Hotten, A 
Dictionary of Modern Slang, Cant, and Vulgar Words... By a London Antiquary (1859). 

5 For example, George Matsell, Vocabulum, or, The Rogue's Lexicon (1859), James Maitland, The 
American Slang Dictionary (1891). See Coleman (2004: 90-100), Coleman (2009a: 155-60). 
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Maurer resented the thoroughness with which Partridge documented the terms listed 
in his glossaries of drugs and criminal slang, even though Partridge almost invariably 
acknowledged his source (Coleman 2011). This dictionary cannibalism can also create 
unsuspected difficulties for a reader. For example, GDS provides the following evidence 
for autem quaver: 


1725 New Canting Dict. n.p.: AUTEM-QUAVERS the Sectaries calld Quakers, who first 
began their Schism by quavering, shaking, and other ridiculous gestures. [Ibid.] 
AUTEM-QUAVER-TUB a Meeting- House, particularly for Quakers. 1737, 1759, 1760, 1776 
BAILEY Universal Etym. Eng. Dict. 1785, 1788, 1796 GROSE Classical Dictionary of the 
Vulgar Tongue n.p.: AUTEM QUAVERS, quakers. [Ibid.] AUTEM QUAVER TUB, (cant) a 
quakers meeting-house. 1809 G. ANDREWES Dict. SI. and Cant. 1811 Lex. Balatronicum 
[as cit. 1785]. 1823 ‘JON BEE’ Dict. of the Turf, the Ring, the Chase, etc. 1835 G. KENT 
Modern Flash Dict. 4: Autem quaver's tub, a quaker’s meeting house. 1848 Flash Dict. 
in Sinks of London Laid Open. ¢.1850 DUNCOMBE New and Improved Flash Dict. 
n.p.: Autumn quaver’s butt quaker’s meeting house. 1871-81 B. M. CAREW Life and 
Adventures. 


The historical slang lexicographer has to find a way to indicate that a dense citation 
paragraph like this one does not actually suggest that the term enjoyed long and 
extensive use: that these citations are all from inter-related slang dictionaries copy- 
ing from one another. The only independent evidence is the first dictionary cited, 
which means that the OED formulation ‘and later dictionaries’ might be a useful 
way of acknowledging the limitations of relying on this type of source. For histori- 
cal slang lexicographers, this would draw attention to the uncomfortable fact that 
earlier dictionaries often offer the only evidence there is that a slang word ever 
existed. 

Reliance on dictionary sources is a particular problem for the slang lexicographer 
who is attempting to achieve broad coverage. Leaving out a word for which little infor- 
mation is available is a dangerous option, because it might become the latest media 
buzz-word in the time between the submission of the manuscript and the dictionary’s 
publication. Users comparing one dictionary with another will tend to be impressed by 
the one that is more inclusive, and slang lexicographers who fail to take account of the 
contents of earlier dictionaries are rarely forgiven. It is safer to put everything in, even 
terms not known to the lexicographer and for which the evidence of use is shaky. For 
example, Partridge made apparently uncritical use of Matsell's carelessly plagiarized 
Vocabulum in his own Dictionary of the Underworld, and many of the entries for British 
slang in The New Partridge Dictionary are based on single citations from earlier slang 
dictionaries. For instance: 


L-plate; L-plater noun 

a prisoner serving a life sentence UK 

A play on L for ‘life’ and the L-plates that signify learner-drivers. 
e—Angela Devlin, Prison Patter, p.71, 1996 
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undoubtedly fed by his thwarted hopes of an academic career (Coleman 2011), and he 
is not alone among slang lexicographers in having felt undervalued and unappreciated. 
Large dictionary projects are notoriously precarious even when they have the backing of 
wealthy publishing houses, and slang dictionaries rarely do. 


19.4.2 Quality vs. Profit 


Some of the short-cuts taken by slang lexicographers can be explained by the unfa- 
vourable ratio of cost to sales. To undertake slang lexicography to a high standard is 
extremely expensive, involving many hours of laborious searching, analysis, and check- 
ing. However, the income that slang lexicography provides is limited and uncertain. 
A scholarly slang dictionary that has taken ten or more years to compile might appeal 
less to the buying public than a brightly coloured paperback dashed off in response to 
a current fad. In response to this, publishers do what they can to maximize revenue, 
and previously published dictionaries are sometimes reissued under new titles with lit- 
tle or no change in their contents.’ These relationships are made explicit on the copy- 
right page and it is unlikely that many individual purchasers would make the mistake 
of buying the same dictionary twice, but libraries sometimes make this mistake. The 
reverse of re-titling is seen in Routledge’s decision to retain Partridge’s name in the title 
of Dalzell and Victor's dictionary although it is an entirely new work. The New Partridge 
Dictionary of Slang and Unconventional English was compiled with entirely different 
parameters than Partridge’s DSUE, in that it includes American slang and concentrates 
on the post-war period. The New Partridge overlooks evidence available in DSUE, but its 
sales were undoubtedly helped by association with Partridge’s name. 

Publishers of large slang dictionaries often produce concise or abstracted versions, 
and the production of updated editions can also help to generate new sales, although 
they may provide relatively little new material. Seven editions of Partridge’s DSUE 
appeared during his lifetime, with later editions offering fewer new terms each time. 
Poor coverage of contemporary slang might pass unnoticed in a large-scale historical 
dictionary, but Partridge also published a slimmer version, the Smaller Slang Dictionary, 
concentrating on twentieth-century slang. This was also revised with a very light touch, 
and here the poor coverage of contemporary slang must have been much more appar- 
ent. Another shorter compilation, Partridge’s Concise Dictionary, based on the eighth 
edition of the DSUE, is presented as a dictionary of contemporary slang, but it makes 
the odd decision to include only terms originating in the twentieth century, and asks 
the reader ‘please, before you angrily denounce this dictionary for omitting your 
own favourite informal usage, pause to reflect: the word may be older than you think’ 
(Partridge’s Concise 1989: vii). I have not yet reviewed the proportion of new material 


7 For example, Robert L. Chapman's The Macmillan Dictionary of American Slang (1995) is the same 
as his New Dictionary of American Slang (1986). Similarly, Rosamund Fergusson's The Past Times Book of 
Slang (1999) isa reprint of her Shorter Slang Dictionary (1994). 
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contained in the satellite dictionaries emanating from The New Partridge, but their rapid 
appearance suggests relatively little new lexicographical input.* 

While the more scholarly slang dictionaries observe some of the same presenta- 
tion conventions as scholarly dictionaries of Standard English (small typeface, narrow 
columns, use of bold for headwords, etc.), it is possible to characterize popular slang 
dictionaries by their nonconformity of appearance. There tends to be more white 
space on the page, with definitions and information about part of speech often being 
presented on a new line. Generous gaps between entries and the practice of begin- 
ning each letter of the alphabet on a new page can also increase the amount of white 
space. This allows publishers to produce a book of reasonable size despite its limited 
contents. Rather than considering this a cynical marketing ploy, we could see it as a 
result of popular slang dictionaries’ status as luxury items. Popular slang dictionaries 
frequently include illustrations, ranging from the helpful to the cryptic, with many 
making use of humour, and these inevitably increase the cost of production as well as 
the appeal to potential buyers. In some slang dictionaries, the artwork is more impor- 
tant than the definitions, and a few are objects of art in themselves, with lavish bind- 
ings and elaborate scripts.’ 

Dictionaries of Standard English tend to adopt an impartial authoritative tone, 
which may conceal biases in their coverage and treatment of vocabulary (Murphy 1998; 
Mugglestone 2007). In contrast, many slang dictionaries, particularly those written for 
the popular market, have a clearly identifiable voice. For example, John Blackman, a 
radio and television presenter, wrote his Best of Aussie Slang in an informal style that 
assumes shared knowledge, values, and even gender: 


bag, the old 


Uncomplimentary but affectionate term for either your wife or mother. (God knows 
what they call us behind our backs!) 


Mondayitis 

That dreadful feeling that overtakes you at the beginning of your working week. 
‘George won't be in today—he’s got a bad case of Mondayitis: It is also acceptable to 
come down with Mondayitis on a Tuesday, Wednesday, Thursday or even Friday! 


A dictionary published by somebody who is already well-known can afford to address 
itself to a well-defined target audience, safe in the assumption that the people for whom 
it is written will find out that it is available. However, other dictionary-writers tread a 
more difficult path in trying to define potentially offensive terms sensitively without 


8 They include, Tom Dalzell and Terry Victor, Sex Slang (2007), Vice Slang (2007), and The Concise 
New Partridge Dictionary of Slang and Unconventional English (2007), and Tom Dalzell, The Routledge 
Dictionary of Modern American Slang and Unconventional English (2008). 

5 For example, John Lawrence, Rabbit and Pork, Rhyming Talk (1975), Lady Kier Kirby, The 376 
Deee-lightful Words (1992), and Ashleigh Talbot, Beat speak: An Illustrated Beat Glossary (1996). 


SLANG DICTIONARIES 335 


reducing their target market. For example, Puxley’s Britslang tends to distance itself 
from the very many politically incorrect terms that it includes, not always entirely 
convincingly: 


Macaroon Coon An offensive reference to a black person. 

Near Enough Puff (Homosexual) An example from pre-politically correct times, 
when the close approach from behind of someone so inclined would draw the 
remark: “That's near enough’ Sometimes said as ‘nigh enough. 

Raspberry Ripple... Nipple That of a woman, a term used solely by men for whom 
it’s a bit of a mouthful, so it's always a ‘raspberry’. 


Because slang is seen as the language of rebellion, slang dictionaries have some- 
times been compiled to promote a political position. For example, Bruce Rodgers’ 
Queen's Vernacular is at once an expression of personal identity and a celebration of 
Californian gay lifestyles in the 1970s. From the other end of the political spectrum 
come publications like Granger and Crow’s Rock Culture Glossary, which sought to 
expose the insidiously corrupting influence of popular music. Since slang dictionaries 
usually make no pretence of impartiality, these biases are generally clear to the reader 
and might be considered less problematic than the undeclared biases of mainstream 
lexicography. 


19.5 ONLINE SLANG LEXICOGRAPHY 


‘The internet has opened up slang lexicography to anyone who wants to become involved 
in it. The best-known but by no means the only example of online slang lexicography 
is Urban Dictionary (<http://www.urbandictionary.com>). Contributors to Urban 
Dictionary offer multiple definitions for slang terms, which are ranked according to 
other users’ expressions of preference, often with little regard to their accuracy. There 
is no guarantee that these casual slang lexicographers actually use the terms they are 
documenting. If dozens of people assert, apparently independently, that they use a par- 
ticular word in a particular way, they have not produced a slang dictionary, but they 
have produced valuable raw materials for one. Unfortunately, contributors’ energies 
are often diverted into abuse and uninformed arguments about etymology and usage. 
Urban Dictionary is a useful starting point for definitions of unfamiliar slang terms, but 
a term’s inclusion on the site does not prove its use (compare the discussion in Peckham 
and Coleman 2014). 

Other online slang dictionaries have tried to harness slang users’ first-hand knowl- 
edge more constructively by asking for more detailed feedback on usage. For example, 
The Online Slang Dictionary (<http://onlineslangdictionary.com>) asks users to indi- 
cate whether or not they have heard or use a term, where they live, and how vulgar they 
consider it to be, Unfortunately, these statistics are difficult to interpret. For example, no 
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one in Australia or New Zealand has registered that they use the term bitch, but this may 
just mean that few Australasians have accessed the site. A list of the one hundred terms 
rated most highly for vulgarity includes only sexual terms, suggesting that voters were 
influenced more by meaning than usage. For example, the jocular euphemism one-eyed 
trouser snake receives a 79 per cent vulgarity rating, while the dysphemistic kark it ‘to 
die’ (more usually spelt cark) is rated at only 14 per cent. 

Online slang dictionaries are often more interested in optimizing their advertis- 
ing revenue than improving the quality of their entries. Allowing multiple submis- 
sions without any requirement that contributors look first at what is already there is a 
good way of growing a website's commercial potential, particularly if contributors are 
provided with the rapid gratification of seeing their definition online. Asking users to 
review and improve someone else’s definition filters out those who are not willing to give 
up more than a few minutes of their time. There is thus an inverse correlation between 
profit and the quality of users’ contributions. Collaboratively written slang dictionaries 
do eventually achieve better quality definitions, but their coverage can never challenge 
the scrappy expanse encompassed by Urban Dictionary. A particularly problem for 
wiki-lexicography of slang is that it seeks to achieve consensus in an area of the lexis 
which is characterized by its flexibility and that the most active participants are not nec- 
essarily those best equipped or temperamentally best suited to the task of evaluating and 
encouraging other users’ contributions. 


19.6 NEW PROBLEMS FOR SLANG 
LEXICOGRAPHY 


The internet has not only offered new models of slang lexicography, it also provides a 
mass of new evidence for informal usage. Blogs, micro-blogs, and social-networking 
sites offer the possibility of monitoring slang usage around the world, Slang lexicogra- 
phers need no longer say ‘originally US’ on the basis of limited written evidence and a 
hunch that aterm will probably be adopted elsewhere if it has not been already: they can 
now access data to substantiate their intuitions. Not only that, but it is also possible to 
monitor frequency of use. For example, wonga ‘money’ is listed in the OED from 1984 
and is currently very frequently used in British slang, but no citations with this spell- 
ing are accessible through Google Blog Search (<http://blogsearch.google.com>) from 
before October 2010 (wongah is listed since 2007), although blogs are indexed from 
2005 onwards, suggesting that the term’s recent popularity may result from its use as the 
name of a high-profile loans company (<https://www.wonga.com>). With this level of 


10 ‘The best known is Wiktionary <http://en.wiktionary.org>, but many online gaming sites include 
wiki glossaries of terms used within the game, which are generally characterized as slang. 
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detail now possible, a new challenge for slang lexicographers is managing the quantity 
of data available. 

In recognition of the existence of global Englishes, large slang dictionaries often claim 
to cover the slang of large parts of the English-speaking world. Although this is an admi- 
rable gesture, it is clearly impossible to fulfil this promise with any consistency because 
of the limitations of personal knowledge. The New Partridge had consultants for interna- 
tional slangs, but its coverage of American slang is still much more thorough than that 
of any other nation. We now expect more of a multi-volume slang dictionary than the 
imperial perspective offered in DSUE, but the cost involved in producing multi- volume 
scholarly dictionaries of international slang can no longer be justified by their sales. This 
means that hard-copy slang dictionaries are likely to be narrower in scope in years to 
come, bringing the next problem of contemporary slang lexicography into focus. 

Dictionaries that do not attempt broad coverage often define their contents in 
national terms, in that they are marketed as dictionaries of American, Australian, or 
British slang, and so on. We have already seen that dictionaries of national slang vary 
in their inclusion of terms also used elsewhere, but a new issue has arisen in the inter- 
net age. We learn colloquial English in conversation with those around us, so it is often 
possible to make clear-cut distinctions between the colloquialisms of one nation and | 
another. However, we generally learn slang from our peer group, and if our peer group 
spends its time online playing games or hunting for celebrity gossip, the boundaries 
between national slangs are likely to dissolve until they represent little more than a con- 
venient fiction. Terms promoted in canny advertising campaigns or popular video clips 
can enter slang around the world practically simultaneously, and these media and popu- 
lar-culture slang terms will be particularly problematic for national slang lexicographers 
of the future. 


19.7 CONCLUSIONS 


We have seen that slang lexicographers range from the professional to the almost illit- 
erate. Even those whose dictionaries aspire to the standards of the best general dic- 
tionaries tend to acquire their skills on the job. The compilers of dictionaries issued 
in separate volumes as the work progresses can find themselves torn between the 
desire to improve their methodology and the wish to remain consistent (see Coleman 
2009a: ch.3; Coleman 2010a: ch.1). Those who produce high quality slang dictionaries 
do so in defiance of market forces. However, the demand for flimsy humorous paper- 
back dictionaries is now being fulfilled by online dictionary sites, so the publication of 
slang dictionaries is no longer a safe way of cashing in on a current trend. With less com- 
petition in the book- market, it is possible that the quality of printed slang lexicography 
will continue to rise. 
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20.1 INTRODUCTION 


No other linguistic sub-field is as closely linked to lexicography as etymology.' Indeed, 
whilst significant work on synchronic lexicology is done without any reference to dic- 
tionaries, major etymological breakthroughs, be they factual or methodological, are 
mostly expressed through lexicographic work, and when they are not, it is their sub- 
sequent acceptance by a reference dictionary which ultimately lends them support. 
Similarly, I know of almost no outstanding etymologist of our time who is not in some 
way linked to a major lexicographic enterprise: most of them are either authors of com- 
pleted or ongoing etymological dictionaries or current or former heads of etymological 
teams for general dictionaries. 

However, if the strong relevance of etymological lexicography (or etymography) for 
scientific knowledge building is self-evident, there exists probably no general agree- 
ment about its scope. I follow here the definition Hartmann’s and James’ Dictionary of 
Lexicography (DLex) gives of etymological dictionaries: ‘a type of DICTIONAR[IES] in 
which words are traced back to their earliest appropriate forms and meanings, this trac- 
ing back being their assumed principal purpose. This means that general and/or histori- 
cal dictionaries (for which see Durkin, this volume Chapter 14, as well as Schweickard 
2011) will not be tackled here, although some of them, like the Oxford English Dictionary 


1 Many thanks to the very fine lexicographers (and linguists!) who agreed to react to a first draft of 
this chapter, first of all to Philip Durkin, to whom I am greatly indebted, but also to Jean-Paul Chauveau 
(Nancy), Steven N. Dworkin (Ann Arbor), Yan Greub (Nancy), Roger Lass (Cape Town), Alain 
Polguére (Nancy), Laurent Sagart (Paris), Wolfgang Schweickard (Saarbriicken), and Thomas Stadtler 
(Heidelberg). 
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(OED) or the Trésor de la langue francaise (TLF), contain encapsulated in them the best 
available etymological dictionary of the language they describe. 

The element word in the DLex definition, although intuitively comprehensible, lacks 
technical rigour, and is therefore ambiguous. I will thus ban word from this chapter 
and make use instead of the threefold terminology (as well as the typographical con- 
ventions attached to it) established within the theoretical framework of Meaning-Text 
theory (see Meléuk 2012: 21-44): wordform (defined as ‘segmental linguistic sign that 
is autonomous and minimal, i.e., that is not made up of other wordforms’), lexeme (‘set 
of wordforms, and phrases, that are all inflectional variants’), and vocable (‘set of lexi- 
cal units—lexemes or idioms—whose signifiers are identical, whose signifieds display a 
significant intersection, and whose syntactics are sufficiently similar’). I find this termi- 
nology particularly useful for etymological and etymographical purposes: first because 
it is coherently based on Saussure’s definition of linguistic signs and secondly because 
it reserves a term (/exeme) for the central unit ‘one signifier, one signified, all inflec- 
tional variants’ of a polysemous vocable, which in most terminologies is not explicitly 
named (mostly, there is talk about ‘words’ developing new ‘senses’ but sense only refers 
to the signified and not to the combination of the signifier, the signified, and the syn- 
tactics).’ Thus, for example, the vocable TABLE—if one agrees, for the sake of simplicity, 
on describing TABLE as a (very) polysemous unit rather than as a set of homonymous 
ones—contains lexemes like TABLE1 ‘article of furniture consisting of a flat top and legs, 
TABLE2 ‘arrangement of items in a compact form, and TABLE3 ‘upper flat surface of a cut 
precious stone, which in turn present the wordforms table and tables; in general, dic- 
tionary entries are made up of vocables like TABLE. 

A firm believer in the concept of proper names as a scalarly stratified part of the 
lexicon (see van Langendonck 2007), I nevertheless exclude here discussion of etymo- 
logical dictionaries of place names (for which see Styles, this volume), personal names 
(McClure, this volume), and other proper names. 


20.2 CONTEMPORARY PRACTICES 
IN ETYMOGRAPHICAL WORK 


Malkiel (1976) provides a book-length typology of etymological dictionaries, analys- 
ing them on the basis of eight autonomous criteria: (1) time depth (period to which the 
etymologies are traced back), (2) direction of analysis (prospection or retrospection), 
(3) range (languages dealt with), (4) grand strategy (structural division of the diction- 
ary), (5) entry structuring (linear presentation of the chosen features), (6) breadth 


? In case of homonymy, each vocable is numbered separate dly, e.g. HANGER’ n. ‘wood ona steep bank’ 
< Proto-Germanic HANGIAN (CODEE) vs. HANGER“1 n. ‘one who hangs’ and HANGER“2n. ‘pendent or 
suspending object’ < English (To) HANG + -ER (CODEE). 
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(information given in the front- and back-matter vs. within the individual entries), 
(7) scope (general lexicon vs. parts of it, e.g. borrowings), and (8) character (author’s 
purpose and level of tone). Amongst these criteria, I will use scope in order to distin- 
guish not so much among different types of etymological dictionaries (although that 
will also be the case), but among three grand etymological classes, which each make 
their own different demands of an etymologist, and which are sometimes dealt with in 
different dictionaries: inherited lexicon (Section 20.2.1), borrowings (Section 20.2.2), 
and internal creations (Section 20.2.3). For each of these classes, I shall try to give a gen- 
eral idea of the (methodological) state of the art, mostly on the basis of etymological 
dictionaries of European languages, and to draw attention to what I take to be the most 
profitable approaches within the field. 


20.2.1 The Inherited Lexicon 


Amongst the three major etymological classes, the inherited lexicon clearly gets the most 
attention in terms of etymological dictionaries devoted to its study. One defining feature 
of this kind of etymological dictionary is its comparative character (see Forssman 1990 
and Malkiel 1990: 1329-30). Indeed, as the inherited lexicon is typically etymologized by 
comparative reconstruction, whole language families (or branches of them, also called 
families) are usually taken into consideration. As a consequence, the arrangement of 
these dictionaries is prospective rather than retrospective (Malkiel 1976: 25-7), that is, 
their lemmata pertain to the reconstructed protolanguage rather than to the individual 
languages on which the comparison is based, Usually, the underlying question these dic- 
tionaries set out to answer is where the inherited lexicon of currently spoken languages 
comes from, and their ultimate goal is to reconstruct the lexicon of a proto-language. 
This is typically the case for the Dictionnaire Etymologique Roman (DERom; cf. Buchi 
and Schweickard 2014), which aims to reconstruct Proto-Romance, that is, the common 
ancestor of the (spoken) Romance languages, following Jean-Pierre Chambon’s claim 
that Romance etymology could benefit from the comparative method (see Chambon 
2010). In this dictionary, comparative reconstruction is used, for instance, in order to 
reconstruct Proto-Romance */'batt-e-/ trans.v. ‘to beat’ from Italian BATTERE, French 
BATTRE, Old Spanish BATER and their cognates (Blanco Escoda 2011/2012 in DERom 
s.v. */batt-e-/). What is standard practice in other linguistic domains is, however, quite 
unusual in the field of Romance etymology, where scholars usually discard the compara- 
tive method as unnecessary in the face of all the written testimonies of (mostly classical) 
Latin. The entries corresponding to */'batt-e-/ in the three major reference dictionaries 
of Romance etymology, Meyer-Liibke’s Romanisches etymologisches Worterbuch (REW), 
von Wartburg’s Franzdsisches etymologisches Worterbuch (FEW), and Pfister’s Lessico 
etimologico italiano (LEI), are indeed made up of written items as found in Latin dic- 
tionaries: battuére (REW 1935° [1911': battuere]), battuere (von Wartburg 1924 in FEW 1, 
290b), and batt(u)ere (Cald/Pfister 1995 in LEI 5, 344a). Currently, there is no agreement 
about the relevance of comparative grammar for Romance etymology (pro: Buchi 2010a 
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and Buchi and Schweickard 2011; contra: Kramer 2011 and Varvaro 2011): the methodo- 
logical principles on which the DERom is based constitute an ongoing debate. 

With the Indo-European Etymological Dictionary project of Leiden University 
(see Indo-European Etymological Dictionaries Online), reconstruction goes even one 
step further and becomes articulated in a most interesting way: first, each of the ety- 
mological dictionaries of individual branches of Indo-European’ reconstructs the 
inherited lexicon of their immediate protolanguage, which then enables reconstruc- 
tion of the Proto-Indo-European lexicon. For instance, the Etymological Dictionary 
of Latin and the other Italic Languages reconstructs, based on Latin, Faliscan, Oscan, 
Umbrian, and South Picene cognates, Proto-Italic *matér, *mdatr- f.n. ‘mother’. For its 
part, the Etymological Dictionary of the Slavic Inherited Lexicon uses Church Slavic, 
Russian, Czech, Polish, Serbo-Croatian, Cakavian, and Slovene cognates for recon- 
structing Proto-Slavic *mati fn. ‘mother’ In the same way, the Etymological Dictionary 
of Proto-Celtic reconstructs from cognates from Irish, Welsh, Breton, Cornish, Gaulish, 
and Celtiberian Proto-Celtic *matir fn. ‘mother. Proto-Italic *mdtér, Proto-Slavic 
*mati, Proto-Celtic *matir, and their cognates in Armenian, Hittite, etc. are then traced 
back to Proto-Indo-European *méh,-tr- fn. ‘mother’ By its completion, this quite revo- 
lutionary two-storied and (on the first floor) multi-flat dictionary edifice will serve as a 
definite replacement of Pokorny’s outdated but still highly valuable Indogermanisches 
etymologisches Worterbuch. 

Dictionaries devoted to the inherited lexicon of language families will be able to achieve 
a high level of excellence if the subgrouping of the cognate languages with which they 
deal is perfectly established. On the other hand, they are most helpful precisely in estab- 
lishing these genetic relationships. Thus inheritance dictionaries like The Sino-Tibetan 
Etymological Dictionary and Thesaurus (STEDT), whose goal is to reconstruct the ances- 
tor language of over 200 languages spoken in South and Southeast Asia whose subgroup- 
ing is to the present day controversial, are of particular academic interest, as can be seen in 
the first part of this dictionary project, The Tibeto-Burman Reproductive System: Toward 
an Etymological Thesaurus (STEDTRepr), which presents etymologies relating to repro- 
ductive anatomy. An earlier publication, the Handbook of Proto-Tibeto-Burman (Matisoft 
2003), conceived as a sort of companion to the STEDT project, however, received quite 
strong criticism because of structural flaws such as the lack of explicitness and thus of 
falsifiability, no safeguards against loans, and faulty Chinese comparisons (see Sagart 
2006). In respect to this last issue, the STEDT should in any case be consulted in paral- 
lel not only with Axel Schuessler’s ABC Etymological Dictionary of Old Chinese, but also 
with Laurent Sagart’s own work The Roots of Old Chinese (Sagart 1999). Without being 
a proper etymological dictionary, this book, which represents a major breakthrough in 
the field of Chinese etymology, etymologizes hundreds of lexical units pertaining either 
to the basic vocabulary (personal pronouns, numerals, body parts, etc.) or to culturally 
relevant terms (transportation, commerce, writing, etc.). 


3 To date, ten of them are published: Armenian, Greek, Hittite, Latin, Luvian, Old-Frisian, 
Proto-Celtic, Proto-Iranian (verbs), and Slavic (as well as, on another level, Proto-Nostratic). 
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20.2.2 Borrowings 


There is no lack of (more or less etymologically oriented) dictionaries of borrowings, 
some of them including also loan translations (calques), semantic loans, and loan 
blends. Be it in loanword dictionaries or in general etymological dictionaries, the lexico- 
graphic treatment of borrowings has to pay close attention to dating: in principle—that 
is, if the donor language benefits from an as well-documented historical record as the 
borrowing language, in order to lend credit to the proposed etymology—the etymon 
has to be documented before the loanword. In practice, however, dating borrowings and 
their etyma is far from being standard practice: only the most sophisticated dictionar- 
ies, like the FEW and the LEI, do it systematically. This is the case for instance in Fléss 
and Pfister 2012 in LEI 12, 1553-7, CATHEDRA/CATECRA, where Italian CATTEDRALE adj, 
‘pertaining to the seat of a bishop's office’ is dated from the first half of the fourteenth 
century, and its etymon, Middle Latin CarHEDRALIS, from the eleventh century; Italian 
"SESLONGA’ f.n. ‘reclaining chair, from 1830, and its etymon, French CHAISELONGUE, 
from 1710. But strictly speaking, the indication of one uncontextualized dating for a 
borrowing is of little significance. First, most datings are tentative and should therefore 
themselves be dated: each text edition hitting the market contains potentially its allot- 
ment of antedatings. If most readers of etymological dictionaries are aware of that, they 
are probably less mindful of another limitation of datings provided by dictionaries: even 
if a given dating holds as an absolute starting point, it says nothing about the—often 
quite lengthy—period between the first time a borrowing was used and its acceptance 
by the speaking community as a whole. Thus, one cannot but agree with Philip Durkin’s 
claim that ‘ideally, etymologies of borrowed items will account for such factors, explain- 
ing not only the initial adoption of a word, but its subsequent spread within the lexical 
system’ (Durkin 2009: 163), although very few etymological dictionaries go into such 
detail. The Deutsches Fremdworterbuch, an etymological dictionary of foreignisms, 
goes a long way in that respect. The entry Hierarchie from volume 7 (2010), for instance, 
which covers twelve pages of text (concerning as well derivatives like H1iERARCH, HrerR- 
ARCHISCH, or HIERARCHISIEREN), quotes twenty-six attestations, from the thirteenth 
century to 2009, for HIERARCHIE1 ‘angels divided into orders, twenty-four, from 1533 
to 2003, for HIERARCHIE2 ‘ruling body of clergy organized into orders, and seventeen, 
from 1758 to 2009, for HIERARCHIE3 ‘classification of a group of people according to 
ability or to economic, social, or professional standing’ 

Another very nice example is Manfred Héfler’s Dictionnaire des anglicismes 
(DictAngl), a model in many regards. In this dictionary, three stages of lexicalization 
are distinguished: quoted lexemes explicitly attributed to foreign languages (marked by 
square brackets), occasional borrowings in texts (marked by ¢), and borrowings which 
appear in the wordlist of general dictionaries (marked by ||). An example of the first 
stage can be found in the DictAngl s.v. hurdler: ‘{Le hurdler, comme les Anglais nom- 
ment ce genre de coureur.. .]}. a quotation (‘the hurdler, as the English call this kind of 
runner’) from 1889. As for French PACEMAKER mn. ‘electrical device for stimulating or 
steadying the heartbeat; it is dated as ‘¢ 1962 pace maker; 1964 || Quill. 1965; Rob. S. 1970 
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pace maker, that is, the first textual testimony of PACEMAKER dates from 1962 and is 
written <pace maker>, whereas the modern spelling <pacemaker> goes back to 1964. 
The lexicographic acceptance of the borrowing can be dated to the Dictionnaire encyclo- 
pédique Quillet from 1965 in the modern spelling and to the 1970 Supplément of Robert's 
dictionary in the now disused written form <pace maker>. 

Most borrowing processes include more or less extensive phonological and/or mor- 
phological accommodation. Ideally, etymological dictionaries would point these out 
(see Buchi 2006), but at least in print dictionaries, space limitation means this is seldom 
the case. One exception is provided by the Dictionnaire des emprunts au russe dans les 
langues romanes, whose entries are punctuated by tags like ‘adapt. morph’ (morphologi- 
cal adaptation), ‘chang. cat. (change in part of speech), ‘chang, genr’ (change in gender), 
‘chang. suff’ (suffix change), ‘greffe suff? (graft: simplex falsely analysed as a derivative 
which received, in place of its pseudo-suffix, a real one). 


20.2.3 Internal Creations 


Within the three grand etymological classes, internal creations receive the least com- 
plete etymological coverage: quite often, they are simply listed, without further com- 
ment, in a ‘derivatives and compounds’ section under their base (see Section 20.3.2.). 
Only etymological dictionaries aimed at specialists apply to internal creations the 
same scholarly standards as to inherited lexicon and borrowings. That is the case, for 
instance, for Gabor Takacs's Etymological Dictionary of Egyptian (EDE), which provides 
not only explicit etymologies (about base and affix) for the derivatives it contains, but 
also supplies extensive references to the relevant literature (an advantage perhaps partly 
explained by the fact that this dictionary is dealing with a chronologically remote lan- 
guage stage, where little can be taken for granted): ‘derives (by prefix m-), as pointed out 
by H. Grapow (1924, 24), H. Smith (1979, 162), and P. Wilson (PL), from Eg[yptian] nhp 
‘bespringen (vom Stier), begatten (vom Menschen)’ (O[ld] K[ingdom], Wb II 284, 3-4) 
= ‘to copulate’ (FD 135) = ‘to procreate’ (Smith)’ (EDE s.v. mnhp n. ‘procreator’), the only 
missing information being here the semantic value of the prefix at issue. 

Depending on the available sources and their datability, etymological dictionaries 
may provide first attestations for internal creations, thus enabling the reader, as affixes 
are only productive during determinate periods, to appreciate the accuracy of the pro- 
posed etymologies, In his FEW entry of sixty-two pages devoted to French BALANCE n. 
‘scales, its cognates, and their derivatives and compounds, Jean-Paul Chauveau in FEW 
2006 S.¥. *BILANX (<http://stella.atilf.fr/few/bilanx.pdf>) thus provides not only explicit 
etymologies, but also datings (where available, ie. mostly for French and Occitan) for 
derivatives, like BALANCETTE (circa 1180; + -ITTU), BALANCERIE (1415; + -erie), or 
BALANCIER (1292; + -ARIU), 

The time depth of etymological dictionaries of languages whose documentation 
goes back only to recent periods is of course shallower than that of the FEW, but 
this is only a difference of degree and not a difference of kind. For instance, the 


344 EVA BUCHI 


Dictionnaire étymologique et historique de la langue des signes francaise traces back 
many of the signs of its wordlist only to the eighteenth (e.g. ‘connaitre’ ‘difficile, or 
‘nuit’) or even to the nineteenth century (e.g. ‘effacer’, ‘fatigué, or ‘poésie’). 

If derivatives and compounds are, as a general rule, properly etymologized (i.e. if they 
are explicitly etymologized!), etymological dictionaries often struggle with less central 
classes of internal creations like ellipses, clippings, or blends. As for idioms, they often 
completely lack any etymological analysis, the worst being pragmatemes like English 
OH, BOY! interj. ‘(cry of surprise, disappointment, or excitement), which is only dealt 
with in Liberman’s very specialized Analytic Dictionary of English Etymology (ADEE) 
of fifty-five entries (ADEE: 17-18). The appearance of new meanings is hardly ever 
considered worth mentioning (see Section 20.3.3). 


20.3 CURRENT ISSUES IN ETYMOGRAPHY 


In this section, I will discuss a few topics which seem, at the same time, central for 
the theory and practice of etymological dictionary making and still lacking a con- 
clusive and widely accepted solution. These thoughts aim to contribute to ‘the peri- 
odic cleansing and, if necessary, the bold replacement of antiquated tools’ used by 
etymographers as advocated by Malkiel (1976: vii). Problematizing these questions 
at a cross-linguistic level and, ultimately, disregarding possible language-related spe- 
cificities, means that I defend the idea of general etymology (like general phonet- 
ics or general semantics) being a viable concept. True, owing to the strong need in 
this field of work for extensive language-specific knowledge in areas like historical 
grammar or philological data, etymologists are of necessity permanently attached 
to a language or at least to a language family. But cross-linguistic collaboration will 
most certainly yield interesting findings both about general mechanisms of language 
evolution and about techniques of detecting occurrences of them in order to firmly 
establish etymologies. 


20.3.1 Underlying Definition of Etymology 


The first issue I shall raise is on a very general level and concerns the underlying defi- 
nition of etymology (see Alinei 1995) shown by etymological dictionaries. Basically, 
there are two possibilities: etymology can be seen as ‘that branch of linguistic science 
which is concerned with determining the origin of words (OED)* or as ‘the branch of 
linguistics which investigates the origin and history of words’ (Dictionary of Historical and 
Comparative Linguistics). According to the DLex, most etymological dictionaries tend to 


4 All boldfaces are mine. 
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operate on the basis of the second definition: ‘the emphasis ... is on the original form of the 
word (also called its RooT or ETYMON), but often its whole history or “curriculum vitae” is 
documented’ (DLex s.v. etymological dictionary). Indeed, no self-respecting Romance ety- 
mologist, for instance, would agree on anything other than a history-oriented definition of 
etymology. This conception goes back to a paradigm change formalized by Baldinger (1959) 
and introduced mainly by von Wartburg (through his FEW masterpiece) and by Gilliéron, 
who ridiculed the previous approach to etymology by comparing it to a biography of Balzac 
consisting of the two following sentences: ‘Balzac, sitting on his nanny’s knees, was dressed 
in a blue-and-red striped gown. He wrote The Human Comedy (Gilliéron 1919: 133). 

As it is, though, only a very small group of etymological dictionaries—amongst 
them the FEW, the LEI, and the Dictionnaire Etymologique de Ancien Francais 
(DEAF)—practice in a consistent manner ‘etymology-history of words, as Baldinger 
(1959: 239) labelled this, at the time, novel kind of etymology, and practically no 
one-volume etymological dictionary does, a noteworthy exception being the 
OED-based Oxford Dictionary of English Etymology (ODEE). In this dictionary, indeed, 
the reader will not only find information, for example, about the origin of the noun 
PIRATE (Latin PiRATA), but also about its semantic enrichment from ‘sea-robber’ in the 
fifteenth century via ‘marauder’ (sixteenth century) to ‘(literary or other) plunderer’ in 
the eighteenth century. 


20.3.2 Wordlist 


Even today, etymological dictionaries are mostly published on paper, and usually in 
prestigious (and costly) premium editions. This adds to their respectability and durabil- 
ity, but limits available space, which has direct consequences for the wordlist: ‘etymo- 
logical information ... is often omitted from derivatives... which are treated as RUN-ON 
ENTRIES’ (DLex s.v, etymological information). This seems to me very risky, because only 
a proper etymological analysis can establish that a vocable which presents itself syn- 
chronically as a derivative is not inherited or borrowed and represents thus the result 
of an internal derivation: etymologically speaking, there is no such thing as a transpar- 
ent derivative! And such a proper etymological analysis will be prevented for vocables 
which do not appear in the wordlist. For that reason] disagree with Malkiel’s assessment 
(1976: 4) that ‘furnishing ofa separate etymological base for each member of a family, is 
scientifically unhelpful’: on the contrary, I would plead in favour of granting entry status 
to all vocables, including derivatives. 

Some etymological dictionaries go even further in their groupings. For instance, 
the Etymologisches Wérterbuch des Ungarischen presents in the same entry macro- 
etymologically linked vocables with distinct etymologies, for instance the Latinism 
TENOR m.n. ‘voice between bass and alto; tenor singer; tone; content’ and the probable 
Germanism TENORISTA m.n. ‘tenor singer’ (Gerstner 2002: 572, 579). Such practices 
should be avoided, be it only because they make automatic extraction and statistical 
treatment of etymological classes very hard. 
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20.3.3 Etymological (and Etymographical) Unit 


What constitutes probably the most important progress margin left for etymological 
dictionaries is closely linked to the fact that even the best etymologists hardly ever give 
some thought to the question of what constitutes the etymological (and etymographi- 
cal) unit: is it vocables like TABLE (with all its meanings) or lexemes like TABLE: ‘article 
of furniture consisting of a flat top and legs’ (see Section 20.1)? In my opinion, individual 
lexemes and not whole vocables are best hypostatized as etymological and etymograph- 
ical units (see Buchi (2010b) and the almost systematic implementation of this principle 
in the TLF-Etym project, e.g. s.v. gémination, where one distinguishes a latinism, a ger- 
manism, and an internal creation). 

If one accepts this approach, one particular etymological category appears as crimi- 
nally neglected by the whole profession: semantic evolutions. Each etymological cat- 
egory requires a specific set of information; for semantic evolutions, two of them seem 
relevant: first, the direct etymon, that is the (possibly no longer existing) lexeme of 
the same vocable which constitutes the starting point of the semantic evolution, and 
secondly hints about its coinage, be it by naming a figure of speech like metaphor or 
metonymy which worked as a universal semantic mechanism or by cross-linguistic 
comparison. This latter procedure would greatly profit from the ‘Catalogue of semantic 
shifts’ gathered at the Institute of Linguistics in Moscow (see Zalizniak 2008). Instead 
of introducing French sarsir2 ‘to understand’ (since 1694) loosely in an unnumbered 
paragraph after sarsrri ‘to grasp’ (since circa 1100, von Wartburg 1962 in FEW 17, 21ab, 
*SAZJAN 2), where the semantic link between ‘to understand’ and ‘to grasp’ remains 
implicit, one could explain the plausibility of such a semantic shift by cross-referenc- 
ing it to parallels like English ro CATCH, German BEGREIFEN, Italian AFFERRARE, OF 
Russian nouatp, which all present the same semantic evolution (see Zalizniak 2008: 
228). 


20.3.4 Etimologia Prossimavs. Etimologia Remota 


In theory, most etymologists would probably be in favour of etimologia prossima, that 
is, of putting forward direct or immediate etymologies. But in practice, etymological 
dictionaries are full of examples where the etimologia remota approach prevails, for 
instance in Vasmer’s Russisches etymologisches Worterbuch (RussEW): ‘iiber poln{isch] 
malowaé aus mlittel]h[och]d[eutsch] mdlén’ (s.v. Manepatb) or in Cortelazzo and 
Zollis Dizionario etimologico della lingua italiana (DELI): ‘dal pers{iano] ..., passato in 
tlur]c{o] e diffuso in Europa attraverso il frfancese] taffetas’ (s.v. taffetta). The etymo- 
logical discourse is better focused in The Concise Oxford Dictionary of English Etymology 
(CODEE), which gives the immediate etymology first: ‘F[rench] ménage, earlier men- 
aige, manaige |, normal development of] [Proto-]Rom|[ance] *mansiénaticum, flormed 
on] L{atin] mansid, -on-’ (s.v. ménage). In my opinion, only ‘F[rench] ménage; that is 
the etimologia prossima part of the etymology, is relevant. Indeed, the fact that French 
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MENAGE is itself inherited has no bearing on its being borrowed by English: had French 
MENAGE been borrowed from another language or created from French material, the 
borrowing into English would have occurred exactly in the same way. This holds of 
course even more for the etymology of the Proto-Romance etymon of MENAGE, which 
is definitely irrelevant. So this information is superfluous by virtue of Grice’s maxim of 
quantity (Grice 1989). But there is more: as the expertise of an etymologist is inevitably 
less profound in linguistic areas other than those dealt with in the dictionary he com- 
piles, informing the reader about etimologia remota constitutes some form of hubris. 
In the given example, the only defects concern minor inaccuracies which go back to 
the—in this case indirect, as the CODEE is based on the ODEE, which is itself based 
on the OED—source in Romance etymology (probably the FEW) or rather to a gen- 
eral flaw of traditional Romance etymology: as the vowel system of Proto-Romance 
(the proto-language reconstructed from Romance cognates) was based only on tim- 
bre and not on quantity, and as Proto-Romance had no equivalent of written Latin <n> 
before <s> nor <-m>—to say nothing about the fact that in Proto-Romance, stress was 
phonological—(Buchi and Schweickard 2011: 630-1), ‘Proto-Rom. *mansi6naticum’ is 
unsatisfactory by contemporary standards. But the central problem lies in the fact that 
the energy and the space allotted to etimologia remota is then no longer available for 
etimologia prossima: in this case, even if the etimologia remota was flawless, it would not 
make up for the fact that the reader is left in the dark about the question whether the 
two lexemes mentioned by the CODEE, namely MENAGE1 ‘housekeeping’ and MENAGE2 
‘domestic establishment; are both borrowed from French or if one of them developed in 
English (see Section 20.3.3). Unfortunately, this kind of lack of balance is very common 
cross-linguistically, even in the best available etymological dictionaries,° and 1 would 
like to strongly advocate its replacement by the etimologia prossima approach. 


20.3.5 Degree of Formalization 


Most (retrospective) etymological dictionaries use only one level of etymological clas- 
sifiers, For instance, the RussEW etymologizes the lexical units it contains by labels like 
‘aus griech[isch]’ (aemou), ‘ursl{awisch]’ (cBet), ‘Deminutiv’ (rymexyo), or ‘Verstarkung’ 
(xopoylom xogntp). Similarly, the DELI will make statements like ‘comp[osto]’ (s.v. post- 
vocalico), ‘da un imit[ativo]’ (badare), ‘da [secento]’ (secentismo), ‘lat[ino]’ (Iago), ‘lat[ino] 
parl[ato]’ (pestello), ‘loc[uzione] fr[ancese]’ (enfant terrible), or ‘v[o]c[e] dotta, lat[ino]’ 


> Well-established etymologies of course lend credibility to possible etyma (see Durkin 2009: 170), 
but that does not necessarily mean they have to be quoted extensively: explicit or even implicit references 
to the relevant reference works serve the same purpose. 

6 ‘Thim (2011: 90, footnote 31) comes to the same conclusion concerning the ADEE: ‘Although the 
problem is by no means restricted to them, the Romance borrowings in particular raise the question 
whether users of a historical dictionary of English need to be given the etimologia remota when the 
immediate source of the borrowing, which after all is the much more relevant information with regard to 
the history of English, is so often neglected or misrepresented. 
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(ossequio). Both dictionaries—and they are by no means alone!—also occasionally go 
discursive, for example, RussEW s.v. nonbka (‘der Tanz ist 1831 in Prag aufgekommen 
und den unterdriickten Polen zu Ehren benannt’), where the wording leads the reader 
to think of the noun asa borrowing from Czech, but neither ‘borrowing’ nor ‘Czech’ are 
made explicit, or DELI s.v. sanseveria: ‘chiamata cos} in onore di Raimundo di Sangro, 
principe di Sansevero, where the entry answers the reader’s supposed cultural curiosity, 
but says nothing about the signifier, the signified, or the syntax of the etymon, nor the 
language it pertains to, nor its etymological class. 

However, authors of etymological dictionaries pertain, in Swiggers’ (1991: 100) word- 
ing, to the species of ‘gardeners’ rather than of ‘moles, that is, rather than being ‘buried 
in their etymological investigations, they make it their profession ‘to homogenize the 
grounds and to collect the harvest.” Thus formalization of their etymological discourse 
playsa major role. I think it would be both more scientific and more helpful for lay read- 
ers if etymological dictionaries adopted a two level model, the first level being reserved 
for the conceptual three-way division among inherited lexicon, borrowings, and 
internal creations, each of them then being subdivided into more specific categories. 
Hopefully that would also prevent etymologists from being absorbed by that ‘quicksand 
of tiny facts and petty commitments’ described by Malkiel (1976: 82). In any case, I agree 
with his assessment that ‘a higher level of formalization in linguistics ... tends to entail 
more sharply pointed discussion (Malkiel 1983: 133). 


20.3.6 Bringing Etymological Dictionaries to an End 


It does not seem possible to conclude this chapter without addressing the embarrass- 
ing question of the publishing rhythm of etymological dictionaries. In fact, there is an 
important dichotomy that should be added to the phenomenology of etymological dic- 
tionaries, namely that between completed ones and uncompleted ones. Unfortunately, 
indeed, the most advanced and most accomplished representatives of etymological lexi- 
cography tend to be almost impossible to terminate in a satisfactory way. This is the case, 
for instance, for the LEI, the first installment of which was published in 1979 and which 
covers to date letters A, B, and parts of C, D, and E (as well as the beginning of the part 
devoted to Germanisms). The same holds for the DEAF, which goes back to 1974: under 
Thomas Stadtler’s leadership, this dictionary was recently split, after having published 
letters G-K, into two complementary parts: while letters D-F will be compiled in 
accordance with DEAF’s philologically and linguistically outstanding standards, the 
remaining (approximately 54,000) lemmata from A-C and L-Z will be published in the 
timesaving form ofa rudimentary semantic classification of the Heidelberg file slips. 


7 Swiggers (1991: 100): ‘peut-on parler de types détymologistes (personne]lement, je vois au 
moins deux types essentiels: les ‘taupes’ enfouies dans leurs recherches étymologiques; les ‘jardiniers’ 
homogénéisant le terrain et rassemblant les récoltes). 
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It appears we etymologists of the early twenty-first century have a collective duty to 
carry out: going in search of means of successfully completing etymological dictionaries 
which seem ‘unfinishable’ Of course, online dictionaries with their unlimited possibili- 
ties for adding and correcting data go a long way toward addressing this concern. And 
let us not forget that no (etymological) dictionary was ever completed without a healthy 
dose of pragmatism! 


20.4 CONCLUSION 


In conclusion, it has to be emphasized that, as a whole, (at least European) etymogra- 
phy has reached an excellent standard. What shortcomings I was led to point out in the 
course of this chapter seem directly related to the fact that even the best educated and 
most professional ‘etymologically-minded lexicographer (Malkiel 1976: 7) is constantly 
under some cultural pressure to reach out to the (supposed) needs of the non-specialist 
by answering (supposedly naive) questions about the origin and history of ‘words. This 
of course sidetracks the etymologist from the real goal of presenting in a dictionary, that 
is, in a semiformalized form, results from advanced etymological research. I would thus 
advocate a firm anchoring of etymological lexicographical work in linguistics—in sci- 
ence (as opposed to culture), In my opinion, this would also have benefits for the general 
public, as popularization often means reformulating naive questions in order to answer 
them in a more pertinent way. 

Many other theoretical and practical issues of etymological lexicography—to quote 
just a few: selection within the ever-growing available primary data, inclusion or dis- 
regard of proper names, or handling of unknown etymologies—could have been dis- 
cussed in this (too?) short chapter. But the reader might agree with Malkiel (1983: 127), 
for whom ‘the ability to control one’s garrulousness has at all times been a major virtue 
in an etymologist. 
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21.1 INTRODUCTION 


For the user of a living language, whether as a native or second language, a dictionary 
can without question be a helpful practical tool and valuable reference resource, but it 
is one that complements knowledge of the language gained by the speaker by using it in 
discourse, and it supports that use. To the user (typically a reader) of a ‘dead’ language, 
a dictionary is not merely helpful or valuable: it is arguably indispensable for enabling 
understanding of a language for which live interactive discourse with native speakers is 
not a possible alternative or complementary route to the same end. The dictionary of a 
dead language therefore has a special, heightened importance, and this is inextricably 
allied to particular issues in its compilation, themselves similarly a consequence of that 
absence of possible interaction with living native speakers. 


21.2 GENERAL METHODOLOGICAL 
CHALLENGES 


ETOP TTT TTT er rer Or reverent rarer reer veer yer arte re rer rererrr yar sever er ervev rrr i tversv iri rer et rarer eer treer etter terre recite eerie ei eri rece e eee ees 


The creator of a dictionary of a dead language naturally faces the various normal chal- 
lenges of producing any dictionary, as discussed elsewhere in this volume, but there are 
also the effects of the obvious distance between the dictionary’s users and the object 
language (and its ‘original’ native users), which present their own challenges as well as 
exacerbating some of the regular ones. Indeed, this is not only a distance but an inherent 
and complex discontinuity, and it leads such dictionaries, as well as having similarities 
to various types of dictionaries of living languages, to have some significant differences 
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from them too. Differences in the range of items included and the types of information 
provided about them are key results of this discontinuity. In particular, they reflect the 
process of preparing the dictionary—both research and presentation—being adapted 
accordingly, that is, to be appropriate for the needs of an audience of users who can be 
expected to differ significantly from typical users of dictionaries of living languages in 
respect of what they seek from a dictionary. 

Fora living language there is a possible and indeed likely overlap between the diction- 
ary’s users and the speakers whose language it describes; or put another way, the users 
of the dictionary will often be users of the language whose use of it includes directly 
interacting in speech with other users of that language. Moreover, there will be a parallel 
overlap and likely familiarity of the dictionary’s users with the extralinguistic context(s) 
of the language, including, for instance, the culture, society, and circumstances in which 
it is used. For a dead language, by contrast, the dictionary’s users’ contact with native 
users of the language is essentially indirect, only through texts written by users of the 
language, living when they wrote, of course, but no longer alive. (In practice, some of 
the texts may well have been the work of non-native speakers, but typically such writers 
would themselves have had at least the possibility of direct or indirect contact of some 
kind with contemporary native speakers.) Furthermore, there is a corresponding paral- 
lel in separation from the extralinguistic context of the language, and this separation 
is all the more stark in that much of what may be known or thought about the culture, 
society, and circumstances of the language’ original use may itself be constructed on the 
basis of the surviving texts. 

It would be easy to characterize the principal challenges in producing dictionaries of 
dead languages solely in terms of the linguistic discontinuity problem, namely the lack 
of available native speaker competence in the language. The lexicographer can have no 
direct access to speakers of the language on whose usage to base the dictionary. (Written 
language, when the writer is not available to be asked to provide any clarification needed 
about the text, provides only indirect access.) However, equally, the typical user of the 
dictionary has no such speakers with whom to expect to engage in that form of live 
interaction, so the significance of this ‘problem in practice may be somewhat dimin- 
ished even if it is a theoretically important one for any form of linguistic description, 
whether lexical or grammatical. 

A comparison with grammars is instructive. Of course, when producing a descrip- 
tive grammar of a language, which properly requires distinguishing what is ungram- 
matical from the merely unattested, the linguistic problem is quite fundamental, since 
ungrammaticality is identified from native-speaker judgements; however, grammars of 
dead languages ordinarily confine themselves to describing the attested usage (which 
may generally be presumed to be grammatical) without prejudice to whether other 
usages were also grammatical, and this at least goes a long way towards satisfying the 
main need of readers of the surviving texts, namely to understand those texts. To some 
extent dictionaries too need to distinguish the impossible (e.g. meanings) from the 
merely unattested, and so those of dead languages may confine themselves in a sim- 
ilar way for this same reason. However, whereas grammar can be considered a set of 
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more abstract, mainly intralinguistic relations between various kinds of linguistic units 
(sounds, morphs, words, clauses, etc.), dictionaries aim to describe lexical relations, 
and these are ‘connections’ not just among linguistic items but also between those items 
and the world.! The overall evidence base needed for grammars and dictionaries differs, 
therefore, with regard to the account that needs to be taken of extralinguistic context. 
Accordingly, alongside the linguistic challenge, the separation in respect of the extralin- 
guistic context is a critical factor for dictionaries of dead languages, in that neither lexi- 
cographer nor dictionary user has relevant first-hand experience of it (nor even direct 
access to anyone who has). This separation cannot be so easily sidestepped: indeed, the 
divide is bridged most substantially by texts that are thus simultaneously (and paradoxi- 
cally) both effectively what is to be described (being the principal source for knowledge 
about the language), for which a knowledge of that context is essential, and a vital source 
for knowledge about that context, for which understanding the texts (and so a fortiori 
their language) is crucial. This may seem a trivially obvious point to make, but in fact 
it goes to the heart of the particular task of producing a dictionary of a dead language, 
because the lexicographer is subject to constant tension between competing demands, 
which emerge as the broader, ‘purer’ ones of the language (and its original users) on the 
one hand—namely to do justice to the language overall in its original context despite the 
limited evidence available from which to work—and the narrower, more practical ones 
of the dictionary’s users on the other, who are themselves often chiefly concerned with 
understanding the self-same limited sources as those on which the dictionary is based. 
These kinds of issue caused by distance and discontinuity are of course not unique 
to dictionaries of Janguages ordinarily thought of as ‘dead’: they are a concern to any 
lexicographer documenting a language in a period before the present day, to which the 
term ‘dead’ might for present purposes equally properly be applied. However, the longer 
the intervening time is, the greater the effects are likely to be, and so it is the very earli- 
est period of attested use of a language for which they are usually most significant, cor- 
responding to what are popularly thought of as ‘dead’ languages.’ In this chapter I draw 
examples from dictionaries of ancient Greek and Latin, as languages which, though they 


1 Strictly, grammar likewise involves some extralinguistic relations with the context, such as 
systematic sociolinguistic variation in which values may be associated with particular grammatical 
(phonological, morphological, syntactic, etc.) patterns. 

2 Surviving evidence need not necessarily be ever less for each earlier period: earlier periods may 
sometimes be far better attested than more recent ones as a result of particular circumstances, such as 
a larger population or greater literacy leading to the creation of more written texts in those periods, or 
better subsequent preservation of texts because they were written on more durable materials or because 
they continued to have prestige in later times and were frequently read and copied, etc. However, the 
general point is valid, that we find diminution over time because texts may be lost at any stage and once 
lost cannot be replaced. (Interesting questions may raised by the (re)discovery of additional texts and 
other similar apparent increases in the available evidence; see also Hays (2007: 483) on the exclusion of 
texts from a corpus as a result of redating or reattribution.) The difference in extra-linguistic context is 
certainly more likely to increase with increasing separation in time. 
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both have a later history, sit well under this definition of ‘dead’ and provide rich evidence 
in having been the object of very many dictionaries over the centuries.° 

We can now turn from these two main underlying causes of challenges in preparing 
a dictionary of a dead language, viz. linguistic and extralinguistic discontinuity, to their 
particular effects as seen in the types of choices faced by the lexicographer, that is, deci- 
sions that yield the distinctive features of their dictionaries and can thus be seen in those 
features. For the sake of analysis I examine these under two broad headings, namely the 
extent of the dictionary (i.e. what variety or varieties of the language it is to include) and 
the content (what kind of information is included about the selected language). Extent 
and content correspond, albeit loosely, to the lexicographical processes of research and 
presentation, and so from them we can start to see how the lexicographer responds in 
practical terms to a dead language. In both areas the decisions taken reflect a compro- 
mise between what is possible, given the surviving evidence, and what is useful, given 
what the dictionary’s users use the dictionary for. Moreover, we see that this compro- 
mise often consists in research results having consequences for decisions about presen- 
tation and, conversely, presentational needs informing decisions about the research to 
be undertaken. 

Throughout the discussion it is important to bear in mind several cautionary meth- 
odological points. First, we are identifying common matters for decision: while we 
expect to find much commonality in the resulting responses insofar as relevant similar 
conditions apply, we must still expect to find some diversity in the decisions taken on 
any issue (e.g, reflecting diverse linguistic situations or user needs relating to particular 
languages).* Second, no decision on any of these matters will have been taken in isola- 
tion: observed features represent the lexicographer’s judgement about how to balance 
the various demands of the different issues (including also those common to diction- 
aries more generally). Third, the editorial process, as witnessed in the resulting text, 
may not necessarily have involved systematic or specific consideration of all, some, or 
even any of these matters, and so some observed patterns may well not be the result 


3 The eventual fragmentation of Latin through regular language change into the array of daughter 
Romance languages, of which many survive, in fact underlines the level of linguistic change and confirms 
the analysis of Latin as ‘dead’ in the relevant respect, especially when seen alongside the survival of Latin 
itself as a second, i.e. non-native, language into the medieval period and beyond. The post- classical 
history of Greek is somewhat different, and unlike the modern reflexes of Latin which bear different 
names (Italian, French, etc.), Greek has not changed its name. Nonetheless, the ancient Greek language 
can be considered ‘dead’ on linguistic grounds and moreover it is a very appropriate comparison with 
Latin in terms of similarities and differences in its historical and cultural context. 

The changes in lexicographical practice over the many centuries during which dictionaries have been 
prepared of these two languages—a longer period than the lexicography of many other languages—are 
also themselves interesting although they will not generally be discussed here because of their limited 
direct relevance to the overall analysis, except insofar as they bear on the development of current 
methods. 

4 This is to say nothing of the diversity typically to be found within a single dictionary resulting from 
its preparation by many people over many years. See Hays (2007) and Corbeill (2007) on changes in the 
Thesaurus linguae Latinae over its long period of preparation. 
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of taking formal principled decisions about the practice to be adopted in particular 
respects. Indeed, on the basis of the resulting text, it is not always possible to distin- 
guish conscious lexicographical decisions from instinctive or intuitive ones; in fact, 
however, both are important and valid responses to the challenges faced and thus they 
merit identification and evaluation. Finally, we should note the types of evidence avail- 
able on which to base our evaluation. In or accompanying a dictionary there is usually 
some ‘metalexicographical’ material discussing the preparation of the text, but explicit 
statements of practice are often very limited. Our main source is dictionary text itself 
(which may or may not conform consistently and exactly to any metalexicographical 
description that accompanies it). In examining the text we should be naturally cau- 
tious about inferring from the result the processes that brought it about, since we see 
only what the lexicographer chose (consciously or otherwise) to include and we do not 
see what was not included nor, typically, the reasons for the choice. Still, we may draw 
some plausible conclusions by interpolation between the circumstantial causes out- 
lined above and the observed features as apparent effects, and the outcome may include 
recognizing implicit decisions which even the lexicographers themselves may not have 
been fully aware of taking. 


21.3 EXTENT 


Research corpora for the linguistic analysis of modern living languages typically con- 
tain many millions of words, and this is known to be essential to capture low-frequency 
usages otherwise unlikely to be found in them; moreover, a great deal of trouble is taken 
to establish what variety or varieties of the language the corpus contains and can be con- 
sidered to represent (see Kupietz, this volume), For a dead language, without speakers 
to be investigated as an alternative form of evidence, the written corpus—its delinea- 
tion and its relation to the use of the language as whole—is clearly all the more criti- 
cal. In respect of all aspects of corpus choice (period, register, and so on) the limits are 
the available evidence and its potential usefulness to the user. Because the amount of 
surviving text is essentially fixed (with only limited amounts of evidence newly com- 
ing to light and no new evidence being created), the lexicographer must judge what is 
to be included as the basis of the dictionary by balancing the need, ideally, for as large 
a corpus as possible, against the need for the result to be appropriately representative 
and helpful: extending the corpus, for example chronologically, may lead to a diction- 
ary that includes more and so may appear to extend its usefulness, but it may as a result 
be one that is less representative (e.g. for the period relevant to a particular text a user 
is reading) or indeed straightforward for its users. Closely related to selecting the vari- 
eties of the language and corresponding corpus, there is a further choice to be made, 
namely how far to represent in the dictionary the usage encompassed by it, that is, how 
many (and which) of the attested lexical items and their senses and uses. In this, as in the 
choice of corpus, judging the needs of the user is again the key consideration. 
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Two of the major dictionaries of Classical Latin prepared during the twentieth century 
will serve to illustrate some of the issues here. By the end of the nineteenth century the 
standard scholarly dictionary of Classical Latin in the English-speaking world was that of 
Charlton T. Lewis and Charles Short (published in 1879 by Harper Bros in New York under 
the title Harper’s Latin Dictionary and released by OUP in Britain as A Latin Dictionary, 
soon known simply as ‘Lewis & Short’ or L&S), an enlargement of Ethan Andrewss trans- 
lation (1850) into English of Wilhelm Freund's Latin-German dictionary (1834-45). The 
twentieth century saw work on two dictionaries to be based on entirely fresh readings of 
original Classical Latin texts, the Oxford Latin Dictionary (OLD) and the monumental 
Thesaurus linguae Latinae (TLL). Because of their intended independence from previous 
dictionaries in this respect, both these new works faced the challenge of deciding on the 
range of texts to be read and the method of reading. 

The TLL, begun in 1894, has a base corpus that includes in principle all surviving Latin 
from ‘the earliest times’ down to aD 600, a period of more than goo years. In practice, the 
first half ofthe period was indeed examined exhaustively, with an archive compiled of quota- 
tion slips for every known occurrence of every word; the texts from the second half were in 
general treated selectively, with specialists excerpting from texts those examples they con- 
sidered significant in some way. This mammoth work had by 2010 published the finished 
dictionary text for only two-thirds of the alphabet (covering A to P with the exclusion of N), 
having accumulated over 11 million slips. By contrast, the OLD, commissioned by OUP in 
1924 to cover Classical Latin down to the end of the second century ap, was finally complete 
in print in 1982. In their different ways, both acknowledge the practical question of the huge 
extent of the material for the period beyond c. ap 200 (down to 6009, also the notional end- 
point for L&S which the OLD was evidently aiming to supersede). Although the choice of 
corpus is not the only cause of disparity in progress in this instance, it is clear that the choice 
of a restricted corpus can be to the users’ advantage if it increases the likelihood of the dic- 
tionary being completed. Quite apart from the task of collecting the evidence (now, at least, 
made easier by electronic resources), the process of assessing the accumulated evidence can 
be laborious and any form of limit that can be justified on other grounds may well be advan- 
tageous.° Harvey (1999: 120, commenting on a medieval Latin dictionary enterprise) made 
a strong point when he observed that ‘university library shelves were replete with fascicules 
of definitive dictionaries of various languages that were complete for the first few letters, but 
that then petered out, either abandoned ignominiously or else still in progress after decades; 
and scholars were as likely to wish to look up a word beginning with S or T as they were one 
commencing with A or B:° 


> Much of the corpus for the OLD was excerpted by cutting up printed editions and pasting onto 
slips, with texts selected for inclusion partly on the basis of availability for this purpose. For a modern 
perspective on the challenges of dealing with large corpora see Hillen (2007), who examines the slip 
method and the more recent possibility of searching electronic data. 

§ On the troubled history of the OLD see Henderson (2010) and Stray (2012). See also Stray (2011) on the 
general difficulties of bringing lexicographical projects (for classical languages) to a successful conclusion. 
Hays (2007) ends by asking whether some readers might in fact be willing to have a slightly less thorough 
process of compilation for the TLL if it makes it more likely to be finished within their lifetime. 
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The first editor of the OLD, Alexander Souter, in fact went on after his involvement in 
the OLD to produce a Glossary of Later Latin covering essentially what was new in the 
vocabulary of the language in the period ap 200-600, and it is instructive to examine its 
contents. In quantitative terms its length at almost 500 pages suggests prima facie that a 
great deal was lost to the OLD by the period’s omission, but a qualitative analysis reveals 
that a large part of the new lexical material in the later Latin period was either the result 
of regular processes of derivation already known in the period covered by the OLD or 
straightforward borrowing from Greek. Thus, in view also of the strongly standardized 
nature of Classical Latin as the enduringly prestigious written variety of that language 
(which itself seems likely to have inhibited lexical innovation in the written language at 
least), clearly very many texts even from the later period are largely accessible to a reader 
armed with only the OLD, a sound grasp of Latin derivational morphology (i.e. gram- 
mar),’ and a reasonable knowledge of Greek (or access to a Greek dictionary). Despite 
the considerable extent of the material available from this period, this work shows that 
the overall contribution that including that material would have made to the OLD in 
terms of further items and meanings (rather than additional later examples of words, 
meanings, and usages attested in the earlier period) was perhaps not so great as to make 
their absence from the OLD as problematic for its users as some perceived, at least in 
respect of the meanings of words being read in later Latin texts. Thus, lexicographi- 
cal practice may be tailored to users’ needs to good effect in terms of the chronological 
extent of the corpus, taking account of relevant language-specific factors. 

Limiting the extent of the dictionary to just the period from which users are expected to be 
reading texts is one way of making the lexicographical task more manageable; indeed there 
is ostensibly some broader merit in focusing on that part of the language that the diction- 
ary’s users are most likely to be reading. By reducing the likely overall number of entries and 
senses, this also brings about an effective and potentially beneficial control on the overall size 
of the dictionary, possibly making it easier to ‘get around the dictionary (e.g. the user has 
fewer pages to turn to find the relevant entry, shorter entries will be more likely to be able to 
appear on a single opening, fewer possibilities for sense are offered, etc.), or freeing up space 
for more detail in the presentation of the content that is included.? However, providing the 
reader with what he or she obviously wants—at its most basic, the meanings of the words 
in the text(s) being read by the user at the time—must be tempered by the lexicographer 
with some regard to what the reader should (and unconsciously probably does) want. This 


7 As anew departure, the OLD in fact included entries for derivational affixes. 

8 Stray (2012) notes a number of reviews of the OLD which, though mainly positive, were still 
critical of the cut-off date chosen. The reasons appear to be partly an assumption that a dictionary must 
explicitly address the authors being read by its users (regardless of whether it would in fact cover those 
authors’ language), probably based—more in sentiment than reason—on the familiar extent of L&S. 
See below for a more principled linguistic justification for enhancing the extent chronologically. On 
huge lexicographical effort for little apparent reward see Hillen (2007), discussing the benefits of TLL’s 
selectively excerpted material. 

> A shorter dictionary may also be a cheaper product for the user to buy, which may be an important 
factor too. 
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includes, crucially, setting the usage of those texts in their appropriate context and setting, 
and for alanguage known only in writing this has important consequences. Without living 
speakers who can be asked for broader information about a given item’s usage (its connota- 
tions, resonances, sociolinguistic value, etc.) in addition to its meaning (which can at Jeast 
be expected of a dictionary), the user of a dictionary of a dead language will (or should) 
often want to be able to obtain some of that information from the dictionary. 

Again, language-specific factors are important. For dictionaries of classical languages, which 
have a surviving corpus containing much high literature, often richly intertextual in character, 
users do typically appreciate information beyond the possible meaning(s) of the item, recog- 
nizing the importance of the distribution of any item in literary contexts, and so they expect to 
be able to learn more about this: for example how the same author uses the item elsewhere, how 
other contemporary and earlier authors used it, etc. We see this implicitly in the reviews of the 
OLD, for instance, with their howls of protest relating to the omission of one author or another 
(mainly due to the chronological cut-off), not necessarily with any particular regard for 
whether that author's actual usage (in terms of lexical items and their meanings) would be cov- 
ered by the dictionary as printed. We see it also in smaller dictionaries of both Latin and Greek 
(e.g. the still-current Intermediate Greek-English Lexicon (1889) abridged from the seventh edi- 
tion of Liddell & Scott's Greek-English Lexicon, and Cassels New Latin-English English-Latin 
Dictionary (ed. D, P. Simpson, London 1959-68)), which indicate (sometimes with brief quota- 
tion) the (principal) authors in whose works a usage is found. As well as period, then, selection 
of authors can be another important consideration for ‘extent: 

Usefulness for literary appreciation as a factor to be considered in decisions regard- 
ing extent, however, raises the key wider question of how far a dictionary of a dead 
language is to be a guide to the language based on the surviving language or whether it 
is to be merely a guide to that surviving language (or even more narrowly that part of 
the surviving language being generally read). Some users may feel that the latter is all 
they generally want or need; indeed lexicographers may sometimes feel that the latter is 
often all that can reasonably be produced. Still, there is implicit support for the former 
in the way that dictionaries in general are expected to be used: users are provided with 
at least the range of meaning that can be discerned for an item and ordinarily invited 
to establish for themselves what part of that range applies in the instance that they have 
come across. In other words, the very presentation of content requires and expects 
users to judge among possibilities and thus to take account of other usage of the same 
item (even ifonly to dismiss it). The consequence of this for the extent of the dictionary 
is the implicit onus it places on the lexicographer to provide as broad a picture as pos- 
sible of usage by including, for instance, as wide a range of genres or types of sources as 
possible. 

It is not practical here to discuss all the many kinds of variation within language that 
the lexicographer might try to provide evidence for (age, sex, register, region, etc.).!° 


10 Baraz (2007) examines some relevant examples in the TLL, identifying, e.g., potionare as lower 
register by comparison with a higher register equivalent potionem dare on the basis of distribution across 
different texts. 
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They are in any case not unique to languages that are now dead, although they are areas 
in which the existence of only written evidence in its necessarily limited quantity may 
make the usual Jexicographical task more difficult, especially in view of the tendency 
for literacy in past times to correspond to a greater Jevel of education than the average 
for its day, which in turn might tend to limit the diversity found within the section of the 
population of language users writing the texts that have survived." There is one type of 
variation that stands out as particularly problematic for a dictionary of a dead language, 
namely the possible differences between spoken language and written language. This is 
important because dead languages were once alive, and that means that they changed 
over time. In the written record we may see evidence of change in the appearance of a 
novel usage at some point in time, but we would expect that many changes first emerged 
in the spoken language, eventually ‘showing up’ in the written language: a usage may 
well have been a current possibility in the minds of original Janguage users (and so be 
important to present-day dictionary users reading texts from their day) even some time 
before the earliest surviving written evidence for it, especially since the survival of evi- 
dence is likely to have been haphazard (ie. the earliest surviving written evidence can- 
not ordinarily be thought likely to be the actual earliest occasion when the item was 
used in writing). For variation according to medium, the basic lexicographic research 
solution is essentially the same as for any other form of variation, to examine (with due 
caution) text types likely to show relevant features, that is, features carried across from 
the spoken Janguage into the written language which are not found in more prototypi- 
cally ‘written’ texts; such types include apparent literary representations of more every- 
day speech (e.g. in the mouths of particular types of characters) or less literary or formal 
documentary evidence (letters, inscriptions, graffiti, etc.). The major modern diction- 
aries of the classical languages encompass this range of material within their declared 
remit. Beyond this, however, we return to the question of period: since spoken innova- 
tion may be expected typically to have preceded any written traces of that innovation, 
the inclusion of a longer period may be a further way in which a dictionary can provide 
some broader suggestion of the language as it existed in an earlier period than that for 
which direct contemporary evidence survives.” 


" There is also some measure of correlation between prestige (or enduring popularity) of particular 
texts and their likelihood of survival through transmission. Both factors tend to lead to a lack of diversity 
in the surviving evidence even for a language that did not undergo significant standardization. Obviously 
diversity is further constrained by many other events or factors (invasions, writing on non-durable 
materials, etc.) that led to the non-survival of the texts that were produced, as well as by differences in 
what kinds of linguistic acts were done in writing in the first place. 

In fact, for Latin we may also note that the standardization process out of which the prestigious 
classical form of the language emerged came after the writing of a number of texts that still survive. 
Evidence of the daughter Romance languages suggests a continuity of use in the spoken language of 
some linguistic features found in these early texts but not found in texts conforming to the classical 
norm. While no particular justification is needed for for wards-looking inclusion (i.e. dictionary users 
may well take it that vocabulary from earlier periods continued in use, at least as passive vocabulary, in 
later periods), this evidence confirms the value of recognizing some ‘retrospective’ extent too. (Given 
this, it is a shame that reviews of the OLD typically complained of the absence of later material for the 
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The final observation to be made here is that, as in any dictionary, regardless of the 
extent ultimately adopted, the more complex it is, the greater the complexity to be cap- 
tured in the presentation of the content. In particular, the resulting collection of material 
forming the extent is far greater than the linguistic knowledge of any individual original 
user of the language, and yet the language in its day (and thus every surviving text) was 
produced from individual original language users’ linguistic knowledge. The presenta- 
tion of the content has to take some account of the fact that the ‘full picture of, for exam- 
ple, an individual item’s usage across all the axes of variation (especially time) is not the 
view that any particular writer or reader would have had of the possible use of that item. 

Both Greek and Latin in fact have dictionaries that seem to challenge this position, 
namely lexicons of single authors (e.g. of Homer, etc.). However, while their sharply 
defined narrow extent might seem to exclude ipso facto the surrounding contextual 
information that is valuable in giving a sense of the linguistic setting in which a text was 
first produced by its author and experienced by its audience, that narrowness of extent 
can be seen simply to displace the treatment of what is usually dealt with implicitly by 
the extent of a general dictionary, into the content of the single-author dictionary: this 
information has to be encompassed in the treatment of the items and meanings that are 
included, whether implicitly in the choice of interpretation offered by the lexicographer 
for a particular example or in other more explicit ways. Accordingly, rather than leaving 
the dictionary user to establish the relevant context for whatever text is being read (as 
would be the case with a general dictionary), the lexicographer identifies more closely 
with the dictionary user, needing to bring out whatever pertains to that author's particu- 
lar texts within the entries for the items and meanings actually found in them. 


21.4 CONTENT 


Although dictionary users may also require other information about lexical items, most 
commonly they look for two things: form (ie. spelling and inflection) and, most of all, 
meaning. Moreover, they must generally be assumed not to know already the informa- 
tion that they will find when they come to what it is they are looking for (for otherwise 
they would probably not be consulting the dictionary in the first place). After establish- 
ing for themselves what can be said about form and meaning for individual items, then, 
lexicographers must present this information to users in an arrangement such that it 
does not require them to know what they are looking for before they find it. Although 
true of other dictionaries too, this is particularly significant for any dictionary of a dead 
language, for the user cannot so easily fal] back on other resources (e.g. native intuition) 


supposed difficulty it would cause in reading authors of that period, many of whom were particularly 
classicizing, but overlooked the benefit that including later material might equally well have for readers 
of texts from the core period.) 
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to help locate the right information and, having found something that seems right, to 
confirm that the right information has in fact been located. 

As far as form is concerned the principal issue for the lexicographer is the identifi- 
cation of lexical items, and this raises three interrelated matters. First, dead languages 
inevitably present difficulties in the reliability of their ‘attestation’: essentially, is the 
apparent attested form really present in the author's original text? While some texts may 
survive in their original form (e.g. as inscriptions on stone or bronze, and even these may 
be damaged or incomplete), many survive only through a manuscript tradition during 
which parts have been lost or altered and non-original features have intruded into the 
text. Although interesting, they are not evidence of the original language, and thus the 
lexicographer must expend some effort in establishing—so far as is practical—reliable 
versions of the texts to be used in the research on which the dictionary is to be based. Of 
course entire careers can be devoted to the examination of the textual tradition of even a 
single author's work, so the lexicographer cannot ordinarily be expected to be the expert 
ruling on the text of any particular work: rather, the dictionary must rely on the special- 
ist expertise of those who have published editions of or otherwise worked closely on the 
texts. (There is consequently a danger of circularity when editors’ decisions in prepar- 
ing texts are reliant on their appreciation of the language based on existing dictionar- 
ies, and this is something of which dictionary users, including those editors, should be 
aware.) Indeed, a criticism levelled at the OLD even when first published was its appar- 
ent dependence (in part as a result of its long gestation period) on editions no longer 
considered to be the most reliable available.” That said, although a dictionary is primar- 
ily a dictionary of the language and not of the particular texts covered, still less of their 
various editions, the lexicographer must expect a dictionary of a dead language to be 
used by those reading texts in various editions (as well as unedited texts): entries need 
to take some account of multiple possible readings where it is relevant to do so, in order 
not to prejudice any scholarly debate over the text (especially when the lexicographer is 
not in a position to judge and might be seen to be ruling decisively in one direction or 
another from a deceptively authoritative position) and because the lexicographer sim- 
ply cannot know which versions of texts a reader might require assistance with. A glance 
through Liddell & Scott’s Greek-English Lexicon (‘LSJ, the ninth edition having been 
revised by H. Stuart Jones), for instance, yields numerous instances where textual doubt 
is clearly signalled, including such expressions as ‘quod fort. legend: (‘which is perhaps 
to be read (instead); at cvotatéc sense 3). Similarly the OLD adds ‘(s.v.L)’ (‘si vera lectio, 
i.e. ‘if the reading is correct’) in quotations to indicate doubt, a practice also previously 
adopted by LSJ.’ 

Once a text corpus is identified and possible aberrant intrusions have been dealt with 
or accommodated, the other concerns relating to form are two types of variation. The 


3 ‘They may not all have been chosen for reliability in the first place; see also note 5. Still, most could 
be thought more modern and up-to-date than those on which L&S was based. 

44 See also Baraz (2007) and Hays (2007) on lexicographical problems associated with variant 
readings and reliability of editions with regard to the TLL. 
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first is inconsistency of orthography, the other grammatical inflection. The lexicogra- 
pher has to decide which instances to take together as representing a lexical item (or, 
at least, a single dictionary entry). Orthographic inconsistency can be expected in any 
language lacking a standardized spelling system, as can more consistent variation corre- 
sponding, for instance, to dialectal differences. The diverse range of forms need not only 
to be recognized together but also recorded and made findable for the benefit of the dic- 
tionary user who comes across the forms in reading. Thus dictionaries such as the OLD 
and LS} provide numerous duly alphabetized cross-references to the relevant entries to 
enable the user to find the entry for an attested spelling; within entries an explicit section 
containing spellings and references can spell out in what array of orthographic forms 
the item has been found. In addition to an apparent overabundance of evidence in this 
respect (more than one spelling per form), a dictionary of an inflected language must 
equally grapple with patchy attestation within the expected range of morphological 
forms for an item. (Of course if extra alternative forms exist, e.g. for the stem of a partic- 
ular tense, they can be handled similarly to multiple spellings, i.e. with cross-references 
and a ‘forms’ section, as in the OLD and LSJ.) When there seem to be gaps (e.g. the 
past tense is never attested), the lexicographer must decide whether to remark on this, 
that is, the question is whether the gap is merely one of surviving attestation or whether 
the forms (even within a predictable paradigmatic pattern) were avoided or even una- 
vailable. This may require additional research in modern scholarship and/or ancient 
testimonia."° 

As well as the question of which forms belong together and how they are to be 
recorded, it is particularly important to consider what form to print as a label to iden- 
tify each entry. Dictionaries are ordinarily arranged around headwords in alphabetical 
order, but the greater the number of forms for any item within the corpus, the greater 
the need for normalization to create the orderliness which allows a user to find the 
desired entry straightforwardly. The type of form chosen in dictionaries of both classi- 
cal languages reflects tradition as much as grammar: nouns are given in their nomina- 
tive singular form, adjectives in their nominative singular masculine form, and verbs 
in their first-person singular present indicative active form. This headword form on its 
own is not sufficient for the user, who cannot necessarily predict from it what the other 
inflected forms of the same item are: accordingly, the headword is supplemented by 
enough further forms to indicate which inflectional pattern the item follows (so called 
‘principal parts’). Still, the dictionary user needs to be aware that even the presence of 
these forms printed in the dictionary is not of itself evidence that those specific forms 


5 The same policy is frequently adopted in classical Greek dictionaries even for forms that are not 
alternatives but are merely part of a complex morphological system in which inflection at or towards the 
start of a word makes it hard for the form found in a text to be a reliable guide to the alphabetic position 
of the entry in which it is to be found, e.g. the 3sc aorist indicative active form énavoe (‘he stopped’) 
from matw (‘I stop’); similar considerations apply to highly irregular verbs such as the future tense olow 
(‘1 will bear’) from pépw (‘I bear’). 

'6 For instance, Dickey (2003) discusses an instance of apparently deliberate avoidance of the vocative 
singular of Latin nouns ending in -eus. 
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existed (nor the ones that they imply): they may all merely be normalized inferences 
from the array of evidence gathered. 

Itis in the area of meaning that the effects of limited evidence are felt the most. The 
lexicographer must first form a view of what the item means in each instance under con- 
sideration within the selected extent. Thus the lexicographer as researcher acts in a way 
as a representative reader, resembling the future dictionary user reading any of the same 
texts, This may itself be a challenge with limited evidence in which every example is 
contentious. A context may be inadequate and offer little help for a precise identification 
(e.g. the instance is in alist of types of plant or weapon, etc. ). An instance may be ambig- 
uous between several possible interpretations, either unintentionally (the author meant 
only one or other, but it is not clear which) or deliberately: in a highly literary corpus 
word-play is often a difficulty, in that actual examples may simultaneously evoke more 
than one sense of a word without implying that such combinations of senses could not 
be separated and used without one another in other instances; even outside deliberate 
word-play more than one meaning may be at work in a single instance, at least in terms 
of the broader connotations raised.”” 

Moreover, the lexicographer’s task goes beyond this appreciation of instances (again 
with cautious and judicious use of expert assistance from specialists): it is then to infer 
and describe the itemm’s semantics, that is, what it is that allows the item to be used in the 
contexts in which it appears. The method for doing this is to observe shared features 
across examples to serve as the basis for sense distinctions. Thus, if the evidence base is 
limited, it can be hard to see common semantic features within the attested usage for an 
item, and when only a single example survives (whether overall or for a particular con- 
text or apparent value) it is inherently impossible to tell, even if the meaning in context 
is clear, what the meaning of the item is (as opposed to that instance of it). In such cir- 
cumstances, a dictionary can but make it clear that its compiler knew of only the single 
example and adduce such secondary information (e.g. from related items, etymology, 
etc.) as may seem relevant. 

In terms of presenting meaning, the balance to be struck is between arrangement 
by senses as identified in the evidence (e.g. basic senses then transferred or figurative 
senses, concrete senses then abstract ones, etc.) or by contextual features. The former 
may present a clearer picture of the item overall and seem linguistically more satis- 
factory, somehow seeming to represent the broader picture of the item from the per- 
spective of the original writer of a text (choosing an item and sense from among those 
available within the language).!* However, for the modern dictionary user, although it 
may be more helpful once the user has found the right entry and sense (since it shows 
more clearly how it fits with the item’s other usage), it can make it harder for the user 


1” Corbeill (2007) discusses his own TLL article for pravus (‘crooked’), noting how it became clear 
that even in metaphorical use with reference to morals there was a significant literal component and, 
interestingly, vice versa. 

18 This perspective may of course be significant for those concerned with the techniques of 
composition for a text. 
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who does not yet know the sense of the instance found and is coming to the dictionary 
for help to identify it. 

Context-based arrangement, by contrast, makes it easier for the user with a particular 
example to find the relevant part of the entry, since the arrangement reflects the kind of 
information the user might already have regarding the instance in question. However, it 
may then not be the clearest way of presenting the overall picture of an item’s meaning 
and use. Thus there seems to be a tension between presenting the language and meeting 
the needs of the modern user. 

Since the modern user is ordinarily an audience for texts (rather than a producer, 
unlike many users of dictionaries of living languages), their relation to the texts is argu- 
ably like that of an original audience in needing to assess any given instance of an item 
and identify its meaning, etc., and audiences—whether ancient or modern—would have 
to use the surrounding context (together with knowledge of the language, internalized 
in the original native user, supported by the dictionary for the modern audience) as a 
guide for this. Thus the ‘audience’ perspective is no less linguistically valid and it is what 
the dictionaries typically adopt, albeit somewhat tacitly: they are generally designed to 
facilitate interpreting texts and, potentially, enabling as good an appreciation as possible 
of how an original audience (who would equally have to have interpreted them) might 
have perceived them, 

In dictionaries of the classical languages the weight given to contextual information 
as ‘signposting’ is very evident. First, such dictionaries often arrange material within 
entries by the grammatical construction of the item, for example by keeping transitive 
and intransitive uses apart, dividing uses according to the case governed by the item, 
etc. Second, contextual semantic information is given (e.g. whether a verb’s subject is 
human, animate, abstract, etc.). Finally, definitions in the larger dictionaries are not 
strictly intended as translation equivalents, though they may often serve as such: any 
lexicographer is acutely aware that the equivalence between original word and transla- 
tion even for an individual sense of an item is imperfect. Instead the aim is to give the 
reader a picture of the meaning of the item so that the dictionary user can understand 
and interpret (and, if desired, translate) the contextualized instance at hand in the text 
being read (which is more amenable to translation). Unlike dictionaries of living lan- 
guages, then, dictionaries of dead languages tend not to provide fine-grained distinc- 
tions of sense but broad groupings with sufficient information for the user to reach a 
reasonable judgement on the example in the text the user is reading.” (Such informa- 
tion may, as in the OLD, include brief descriptive comments on any particular examples 
quoted by the dictionary, indicating the way in which they fit in the position assigned 
to them within the categorization.) We regularly find only two levels of hierarchical 
division (typically sense and subsense) and only rarely are large numbers of senses or 


19 This characteristic of classical dictionaries has been commented on by reviewers: Stray (2012: xv) 
notes Kenney’s series of reviews of fascicles of the OLD which found it ‘more flexibly organized’ than TLL 
but ‘at times over-subtle in its distinctions’. In fact, Kenney (1970: 91) refers to a dictionary as ‘a classified 
directory of contexts: 
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subsenses distinguished: the user is helped to identify the particular meaning in the con- 
text being read rather than being invited to guess how best to classify it within some 
complex nuanced structure, ready to be added to the dictionary itself. 

To take an example, the entry for 6p@addc in LSJ defines sense I as ‘eye’ and gives 
a phrase éyetv év o. (found in Xenophon’s Anabasis) with the definition ‘to have before 
one’s eyes’, indicating the word in question in italics, and the contextual information in 
roman type. Also within sense I we find ‘eyes were painted on the bows of vessels’ (using 
the same typographic convention), bringing a transferred use of ‘eye (ie. ‘a pictorial 
representation of an eye’) under the same sense. Sense II capturesa less concrete mean- 
ing but one still relating to the visual organ: ‘the eye of a master or ruler. Sense IIL is ‘the 
eye of heaven’, and sense IV ‘the dearest or best’ (a purely contextual sense, as a meta- 
phorical interpretation of ‘eye’ considered as a highly valued organ of the body). Senses 
V, VL and VII move away from bodily eyes, etc. to other transferred uses: V ‘eye or bud 
ofa plant or tree, VI ‘a surgical bandage covering one or both eyes’ and VII Archit. . . . the 
disks forming the centres of the volutes of an Ionic capital. Thus, we see the broadness 
of the groupings even in a major dictionary and in them we see the expectation that the 
user will judge the interpretation of an instance in a particular text being read, given the 
guidance that dpOahudc is used, for instance, in various places where ‘eye’ could be used 
as an English equivalent.”° 

With the division and ordering of senses, and the presentation of content more gener- 
ally, we return to the important difficulty that is the discontinuity of experience between 
the original language users and the modern lexicographer and dictionary users. It is one 
thing to help a reader to work through to finding a meaning for an example; it is also 
possible to set out a range of potential alternative interpretations, connotations, and fur- 
ther allusions for a user to appreciate in addition. However, it is quite another matter to 
try to enable the user to see things in the way that a native would have, for example to 
have the same first impression on hearing or reading something. Even a reasonable gloss 
of an item with straightforward semantics (i.e. one that is not polysemous) is likely not 
to convey everything. 

The situation is not wholly bleak. Corbeill (2007) discusses several examples from 
the TLL (e.g. facies and gestus) in which the arrangement of senses seeks to offer some 
insight into how the native users of Latin would have perceived the items in question; 
furthermore this arrangement is supported by presentation of additional ‘metalexical’ 
information, namely ancient etymology (e.g. associating facies with facere). The divide 
in experience, therefore, is one that can be amenable to bridging in small ways, by means 


20 Ina similar vein, see James (2010) for a detailed discussion of several entries from the Cambridge 
Greek Lexicon, an intermediate dictionary in preparation aimed at learners of Classical Greek, where he 
takes the line that learners may need more guidance (and thus more subtle distinctions) to help them 
towards understanding a text, Again it is clear that lexicographers must consider their audience. See also 
Baraz (2007) and Hays (2007) for detailed analysis of the development of the structuring and division of 
some TLL entries. Corbeill (2007) talking about the TLL entry for pravus describes the article as ‘{cuing] 
the user to this slippage [sc. between literal and metaphorical crookedness across the use of the word] 
without evaluating what it may mean’ (my italics); see also note 17. 
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of contemporary testimonia such as etymologies, discussions of who used which words 
(e.g. Aulus Gellius on the distribution of the oaths mehercle ‘by Hercules, (ede)pol ‘by 
Pollux; and ecastor ‘by Castor’ between male and female speakers), etc.”! This miscel- 
laneous information cannot be expected to be complete for any item nor available for 
every item, but its inclusion does allow the lexicographer to help the user better appreci- 
ate the original context of a text being read. Similarly the dictionary may present more 
obviously historical or cultural contextual information, such as the diagram and expla- 
nation of the layout of a table at a cena (a Roman upper-class dinner party) given in L&S 
(s.v. accumbo). 

A second relevant feature typical of dictionaries of dead languages and seen widely 
in those of Latin and Greek is the simple borrowing or non-translation of lexical items 
when they appear in derived contexts, that is, definitions effectively cross-refer to 
other definitions. For instance, the OLD defines phaleratus as ‘wearing phalerae: The 
effect this has is twofold. First, it allows greater precision about what can reliably be 
said about the context: thus we may be able to say that an X was ‘a maker of Y’ without 
knowing for certain what Y is, which is properly a matter for the entry for Y in any 
case. Second, it starts to reveal the relationships among lexical items and also between 
lexical items and what they refer to in the world. This is one effective way of trying to 
tread the line between offering the modern user an explanation that does not mean 
anything to him—because the cultural references or concepts involved are so differ- 
ent or unfamiliar—and one which is more accessible to the modern user but in danger 
of being anachronistically unrepresentative of what the original language user would 
have understood.” 

This brings us to the final major aspect of the dictionary of the dead language, namely 
its status as a work itself and especially the balance between having authority and being 
authoritative. The lexicographer always strives for a fair account of the surviving evi- 
dence, and, if it is diligently analysed, this gives the dictionary a degree of authorita- 
tiveness; users can reasonably rely on the information provided, However, as we have 
seen, the compiler of such a dictionary is repeatedly invited to go beyond the evidence 
into territory for which perhaps native knowledge and competence would be the only 
true source of authority. In the absence of this, it is unsurprising that dictionary users 
may come to a dictionary assuming that it can and will provide the information needed 
and that this information is uniformly certain. In how such a dictionary presents itself, 
however, we have seen ways in which this attitude is carefully discouraged: lexicogra- 
phers of dead languages instead encourage their users to approach their dictionaries 


21 On Latin oaths see Ashdowne (2008) with further references: Gellius’ description in the second 
century AD of the distribution has been shown to be accurate with respect to their use in the Old Latin 
comedies of Plautus and Terence, although the extent to which these represented the contemporary 
everyday language is hotly debated. 

#2 On aspects of the treatment of cultural differences and related issues, see Cablitz (2011), discussing 
the lexicographical recording of endangered languages in the modern world. 
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with different expectations and to reach views about interpretation for themselves aided 
by rather than derived from the dictionary.”? 


21.5 CONCLUSION 
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Dictionaries of dead languages are in many ways similar to those of living languages, 
in attempting to present a picture of the vocabulary of the language. They are, however, 
limited by the constraints of the evidence that happens to have survived. They tread a 
fine line between limiting their coverage to what there is direct evidence for while seek- 
ing to offer the broader picture of the language more generally, whether explicitly in 
what they say or implicitly in how they encourage a reader to thinkindependently about 
the value of the instance at hand. 


23 See also Corbeill (2007) who discusses the ‘oracular’ nature of the TLL: it does not pronounce 
authoritatively that X word means Y (in the modern sense of oracle), but rather with cryptic allusiveness 
(like an ancient oracle) it invites its users to consider what the overall sense might be. The liberal use of 
the question mark and qualifiers such as ‘prob: and ‘perh. in the OLDis a constant signal of the caution 
urged on the reader. 
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22.1 THE THESAURUS 


A close sibling of the dictionary, the thesaurus is a work of lexicographical reference 
which presents lexical facts with semantic domains as its core organizational princi- 
ple, rather than in alphabetical arrangement. Far from being a simple repositioning of 
existing dictionary entries in topical order, thesauruses are the oldest recorded form of 
lexicographical work, and blend facts about the language with facts about the world in 
which the language is used. Asa result, the classification system at the heart of a thesau- 
rus represents a synthesis of the conceptual vocabulary of a language, and of the relative 
place of each concept in the wider conceptual system of which it is a part. 


22.1.1 At the End of the Alphabet 


While it may be natural for many present-day readers to consider a thesaurus as an 
accessory to a dictionary, given the dominance of the latter format’s alphabetical order, 
this situation is mostly due to the dictionary system's ubiquity in recent years. By con- 
trast, such was the historical dominance of the meaning-based ordering of information, 
and the dictionary’s alphabetical arrangement so strikingly novel in its early publication 
history, that Robert Cawdrey in his 1604 A Table Alphabetical felt it necessary in his 
prefatory note “To the Reader’ to explain this system, in a notoriously awkward passage: 


If thou be desirous (gentle Reader) rightly and readily to vnderstand, and to profit 
by this Table, and such like, then thou must learne the Alphabet, to wit, the order of 
the Letters as they stand, perfecty [sic] without booke, and where euery Letter stand- 
eth: as b neere the beginning, n about the middest, and ft toward the end. Nowe if the 
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word, which thou art desirous to finde, begin with a then looke in the beginning of 
this Table, but if with v looke towards the end. Againe, if thy word beginne with ca 
looke in the beginning of the letter c but if with cu then looke toward the end of that 
letter. And so of all the rest. &c. (A Table Alphabeticall1970 [1604}) 


It is understandably difficult for a modern reader to comprehend the need to explain 
something as now ingrained as ordering items alphabetically, but Cawdrey’s labour can 
give a useful indication of the artificiality of such an arrangement. 

The thematic and non-alphabetical thesaurus, by contrast, has a more immediately 
logical classificatory system, and a longer historical pedigree. As an alphabetical sys- 
tem is by far the most efficient option when a reader speedily requires a particular 
structured piece of data about a known and specific word, and as this is the prime usage 
case of a modern dictionary, the alphabet has held an almost unrivalled sway over lexi- 
cographical arrangement since Cawdrey’s time. Accordingly, most thesauruses include 
an alphabetical index, either within their pages or as a separate volume, for ease of 
lookup. 

Nonetheless, given that alphabetical arrangement exists purely for efficiency of access, 
it is no exaggeration to say that this makes such an ordering intrinsically uninteresting. 
Alphabetical order tells a reader nothing about the word itself other than its opening 
configuration of letters, and leads to the alphabetical fragmentation of facts, detaching a 
dictionary so arranged from any possibility of linking order to meaning. 


22.2 THE STRUCTURE OF A THESAURUS 


By contrast to alphabetical ordering, a thesaurus places meaning at the heart of its 
arrangement. Lexical facts are clustered with other, similar facts, arranged on the 
grounds of semantic features, prototypicality effects, usage domains, or thematic 
harmony. Core to this principle is that, beyond a top-level arrangement devised for 
pragmatic reasons (see Section 22.2.1), thesauruses are arranged logically and system- 
atically, and it should be apparent to a user how neighbouring entries are related to 
each other. 

While this system requires an alphabetical index for convenience, confident and fre- 
quent users of a thesaurus such as Roget’ Thesaurus of English Words and Phrases (1852) 
can become familiar with the system of arrangement, and find material with some ease. 
In many historical cases, as outlined elsewhere in this chapter, such thesauruses’ struc- 
tures exist either in tandem with, or derived from, an attempt to systematically catego- 
rize the world and all of human experience, with the assembly of lexical facts but one 
part of the overall endeavour. The converse is also true, with lexicographical works such 
as Roget and the derived WordNet being popular amongst computer scientists for cate- 
gorization systems (Fellbaum 1998a). The general pattern of arrangement of a thesaurus 
is outlined below. 
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22.2.1 Macrostructure 


Scratch any thesaurus-maker and you are likely to find not only a lexicographer but 
also someone ambitious to impose a degree of structure on the apparent disorder of 
our world. To some extent, this is inevitable, since a thesaurus has no predetermined 
order; one has to choose a starting point, and each category thereafter has to be 
related in some relatively transparent way to its fellows. So while thesauruses are pri- 
marily constructed from the bottom upwards, with words and concepts as their main 
building blocks, they require a high-level structure to be in place during their crea- 
tion, splitting the world into major classes or categories under which other divisions 
can be placed. It is entirely possible to classify the semantic domain of Plants, say, or 
Emotions, in a wholly data-driven fashion, but it is much harder to then continually 
work upwards to an entirely empirical framework; decisions must be made, often 
on philosophical, psychological, or (more frequently) pragmatic grounds, about how 
to represent the large, abstract notions under which everyday life can be subsumed. 
These top-level classes often include such considerable concepts as Life, the Universe, 
Matter, and so forth (see Fischer 2004 for a comparative investigation of these struc- 
tures). Such choices will often throw light on the intellectual climate and prevailing 
world view(s) of the time when the thesaurus was produced. Hiillen (1999), for exam- 
ple, points out the shifting position of God in early classifications: initially usually at 
the beginning as the omnipotent creator, but increasingly towards the end as part of 
the social artefact of Religion, although occasionally somewhere in the middle. 

Such major divisions form a thesaurus’ macrostructure. They are frequently dis- 
cussed in any prefatory matter or in the lexicographical literature, as their arrangement 
is necessarily arbitrary, although ideally logical. At the high level of abstraction where 
these divisions operate, it is difficult to argue for the efficacy or naturalness of any of 
them, although pragmatic arguments can often be made for each. 

The most famous macrostructure is Roget's, with 1,000 categories split into six major 
classes. Roget’s macrostructure is shallowly hierarchical, having only a few layers of 
superordinate categories within which to place the concepts covered by the lexemes it 
collects, but it nonetheless illustrates the hierarchical framework which is necessary to 
unite a thesaurus’s data-driven microstructure with the abstract macrostructure (see 
further Section 22.6.2). 

The most complex thesaurus macrostructure is that of the 2009 Historical Thesaurus 
of the Oxford English Dictionary (HTOED), also known as the Historical Thesaurus of 
English, which was based on the second edition of the Oxford English Dictionary (OED), 
in addition to other sources.! The HTOED system begins with three major divisions—the 


1 Electronic access to the HTOED is through either the OED website at <http://www.oed.com> (this 
contains only those items in HTOED which are present in the OED, and therefore does not include 
material only found in Old English), or through the Glasgow Historical Thesaurus of English website at 
<http://www.glasgow.ac.uk/thesaurus>, which contains the full database with explanatory material, and 
uses a revised hierarchical structure. 
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physical world, the mental world, and the social world—and then proceeds into a hier- 
archical structure holding up to twelve nested levels of sub-categorization, This system 
is necessary to adequately organize the over three-quarters of a million lexemes the 
work holds. 

Finally, a thesaurus macrostructure can be as idiosyncratic as its compilers wish it 
to be; for example, P. R. Wilkinson’s Thesaurus of Traditional English Metaphors (2002) 
bases its classification on a nursery jingle, having as its highest-level divisions Tinker, 
Tailor, Soldier, Sailor, Richman, Poorman, Beggarman, and Thief, with Wilkinson adding 
categories of At Home, At School, and At Play. This results in such startling sub-categories 
such as G.2b ‘Hostile receptions with mud’ and G.2d “Hostile receptions with dogs, con- 
tained under Beggarman (385-6). 


22.2.2 Microstructure 


Within a macrostructure, a thesaurus lumps or splits its contents to varying degrees 
of granularity, as do all other works of lexicography. A distinction can be drawn 
between those thesauruses which focus on distinctive microstructures, attempt- 
ing to find classes with contents which are recognizably semantically discrete from 
all others of similar meaning, and those which have cumulative microstructures, 
wherein the classes assembled contain many words with a relationship of similar- 
ity to each other, but are not finely distinguished in their meaning. An example of 
the former is the HTOED, with 236,000 categories for 797,000 words (an average 
of 3.4 words per category), while the latter is best exemplified by the 2002 edition 
of Roget, which has 1,000 categories for ‘over 300,000 words’ (approximately 300 
words per category). In the case of a cumulative arrangement, the internal layout of 
a category list can be in order of frequency of use, alphabetical arrangement, parts 
of speech, or in a more impressionistic style which depends on the intuition of the 
compiler. 

Each of these two structures has advantages and disadvantages. The primary disad- 
vantage of a distinctive microstructure is the time and energy required to draw apart 
the subtleties of each word, arrange it into a separate category, and then decide where 
this category sits in relation to the enormous number of other categories which are 
necessitated by this system. For this reason, thesauruses in the distinctive camp often 
have a complex hierarchical] structure to deal with the large number of categories 
required. The advantage of this style is that it gives precise and detailed information 
about each word in turn; similarly, the lack of this precision is often the disadvan- 
tage of cumulative microstructures, and tales abound of students misusing Roget by 
assuming any word in a Roget entry can function as a strict synonym for any other. 
The comparative ease of constructing a cumulative thesaurus, alongside its simpler 
structure for the user (a consideration mainly important in marketing rather than in 
actual usability), is the main advantage of this type of structure. 
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22.3 PARTICULAR TYPES OF THESAURUSES 


The comparative rarity of thesauruses means that they do not easily form a homogenous 
grouping. If the modern, synchronic thesaurus such as Roget is taken to be the stereo- 
typical work, then thesauruses which differ from this straightforward model are easily 
found. They fall into the (non-exclusive) categories below. 


22.3.1 Historical Thesauruses 


If thesauruses are rare, then historical thesauruses are even rarer. In addition to the 
usual problems common to lexicography, historical lexicographers have to engage with 
issues such as scarcity of evidence, changing world views, and lack of appropriate ency- 
clopaedic knowledge on the part of both compilers and users. The basic data usually 
come from historical dictionaries. It could be argued that creating thesauruses from dic- 
tionaries imposes another editorial layer between the lexicographer and the texts; on 
the other hand, there is little point in repeating work already done by other lexicogra- 
phers. Problems can arise if the dictionaries disagree with one another, or the thesaurus- 
maker disagrees with the dictionary, or if, despite everyone's best efforts, the meaning 
of an older word is not fully known. Thus, in A Thesaurus of Old English (TOE, 1995), 
which classifies the Anglo-Saxon vocabulary surviving from the late seventh century aD 
onwards, the editors ended up with a category called simply “Unidentified plants. Plant 
names are notoriously difficult to interpret—for example, there is no parallel category 
for animals, which are much more readily identifiable. 

One solution to the problem of changing world views is to allow the classification 
to emerge from the words rather than impose a classification upon them. This is the 
approach taken in Spevack’s Shakespeare Thesaurus (1993), which derives from an anno- 
tated database of Shakespeare’s work. He writes that ‘a pragmatic cycle of shuffling and 
filtering and reshuffling of the vocabulary has determined the classification: that is, the 
names were supplied after the groups began to take shape’ (1993: ix). A similar proce- 
dure was followed in TOE and HTOED. All three also set up sub-categories when the 
weight of vocabulary demanded it; to take Spevack’s example, the lexicon suggested not 
only a category of ships, but also categories for various parts of ships, sailors, and navi- 
gation. Only when this stage has been reached can one begin to think about the role 
and importance of seafaring to the Anglo-Saxons, Shakespeare, and the Elizabethans, or 
English speakers over the centuries. 

Such folk taxonomies, informed by what Hallig and von Wartburg describe as ‘naive 
realism, guided by ‘the intelligent average individual's view of the world, based on pre- 
scientific general concepts made available by language’ (cited in Ullmann 1962: 255), 
work well for thesauruses of languages remote from science such as Buck's A Dictionary 
of Selected Synonyms in the Principal Indo-European Languages (1949) or TOE. Once 
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expert scientific taxonomies become available, and part of at least some people's world 
views, the classifier may want to take account of both, as John Wilkins did in 1668 when 
he was confronted with John Ray’s classifications of plants and animals, and abandoned 
his original functional taxonomy of ‘plants for pleasure; ‘plants for nourishment, and 
‘plants for medicinal purpose’ (Hiillen 1999: 262-3). For HTOED, classifying tens of 
thousands of words across the entire 1,300-year written history of English, the expert 
taxonomy was often the best solution for major scientific sections, although folk cat- 
egories like Domesticated Animals or Cultivated Plants were included when justified 
by the data.’ 

Within the macrostructure, historical thesauruses which regard their data as belong- 
ing to a single period will usually display synonyms in alphabetically organized lists. 
Those with a diachronic spread will order lists chronologically, or will compromise by 
including some information about dates of use within an alphabetical list, as The Scots 
Thesaurus (1990) does. Headings will usually be in modern English, since it is unreal- 
istic to expect the generality of users to be familiar with older English or Scots. By the 
same token, older or more obscure words may be omitted from the index. In TOE and 
HTOED, considerable care was expended on devising headings which would supply 
more information than is usual in thesauruses, by a process of leading back from a spe- 
cific to a general idea. The HTOED category Lexicography, 02.07.04.05.04, is shown in 
Figure 22.1 (note that the numbering of the printed HTOED shown in the figure differs 
in some respects from the updated numbering given here). Its numbered position can 
be tracked back through 02.07.04.05 Linguistic Unit to 02.07.04 Study of Language, 02.07 
Language, and finally to 02 The Mind. Within the sample, 02.07.04.05.04|03 dictionary 
has the sub-category 02.07.04.05.04|03.03 parts of a dictionary entry, and, at the next 
level down, 02.07.04.05.04|03.03.01, a category for one of these parts, the head-word. 
Users of the electronic version can toggle back to the OED to discover that Wilkins 
was first to use the term dictionary-making, while Dalgarno, followed by Johnson and 
Boswell, used lexicography, or to enjoy Swift's pertinent observation of 1726 under dic- 
tionary-maker: ‘Writers of Travels, like Dictionary-Makers, are sunk into Oblivion by 
the Weight and Bulk of those who come after, and therefore lie uppermost: 


22.3.2 Synonym Dictionaries 


While we have thus far discussed thesauruses as purely semantically arranged, there are 
also hybrid forms, which are sometimes called synonym dictionaries, containing small 
semantically-arranged lists of synonyms identified by a headword (e.g. some synonyms 
for pleasant), with thousands of those small packages then published in alphabetical 


2 For a discussion of the issues involved in classifying plants, see O'Hare (2004). Further details about 
HTOED can be found in its Imtroduction, and in Kay (2010, 2012a). 
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02.08.04.05.04 (n.) Lexicography 
dictionary-making 1668- - lexicography 1680- 
lexigraphy 1828/32- {Dict.) - dictionary-work 1887- 
O1 lexicographer dictionarist 1617 - lexicographer 
1658- - dictionary-maker 1727-1882 + word-catcher 
1735- + dictionary-writer 1742 » lexicographist 
1834/43- 02 lexicographical writings lexicographics 
1716 03 dictionary dictionary 1526 - calepin 
1568-1662 « world of words 1598-1696 + lexicon 1603-1848 - 
thesaurus 1736-1862 03.01 specific dictionaries 
alveary 1580 « gradus ¢1764~ + Webster 1843- « the/an 
unabridged 1860-1894 + O.E.D. 1898+ 03.02 specific 
types of dictionary interpreter 1687-4672 » etymologicon 
1645-1862 + pronouncing dictionary 1764-1857 > rhyming 
dictionary 1775- + idioticon 1842-1883 + collegiate 
1898 (N. Amer., also Dict.} + collegiate dictionary 1898- (arig. 
Dict.) « desk dictionary 1948- » learner's dictionary 
1948- - reverse dictionary 1954- 03,03 parts afa 
dictionary entry 03.03.01 head-word/-form main word 
1888 - head-form 1962 - entry form 1962- - head-word 
1966- 03.03.02 femmo lemma 1951- 03.03.02.01 oct 
of sorting into lemmatisation 1967- 03.03.03 Jobef 
label 1911- 04 vocabulory/collection af words vocabular 
1530 + vocabulist 1530(2) - vocabuler 1530-1706 ° 
vocabulary 1532- * nomenclator 1585-1707 - word- 
book 1598 - verbal 1599-1623 - lexicon 1656-1823 + 
nomenclature 1659-1745 + vocabula 1698 + vocab 
1908 04.01 one wha compiles vocabulist 1545; 1800 « 
nomenclator 1609-1622 05 vocabulary of proper names 
onomasticon 1710- 05.01 ane who compiles onomastic 
1609-1716 6 glossary glossary 1483- 06.01 specific 
microglossary 1955- 06.02 one wha compiles 
glossarist 1782~ 07 dictionary of synonyms/antonyms 
sylva 1675 * synonymicon 1813 - thesaurus 1898- (US) 
08 thesaurus thesaurus 1852- 08.01 specific Roget 
1940- 09 fist of key-words word-index 1937- * thesaurus 
4957-  09.01 concordance concordance 1387-1869 * 
concordant 1625 69.01.01 one who writes concordist 
111» concordancer 1888 10 phrase-book phrase-book 
1594- « phrasedlogy 1776 


FIGURE 22.1 HTOED ‘Lexicography’ 
© University of Glasgow 2009 


order. These use what McArthur, discussing alphabetic reference books generally, 
describes as the ‘line and blister’ model, where the line represents the alphabet and ‘each 
of the blisters represents a special group of synonyms that are best explained together, or 
a semantic field that should be kept reasonably unified, or a special subject that ought to 
be covered in depth in one place—despite the alphabet’ (1986: 156). 

These dictionaries, which often have ‘thesaurus’ in their titles, have the advan- 
tage of being somewhat quicker to access than a wholly semantically arranged work, 
and are often the main type found in schools, but they require either extensive 
cross-referencing (thereby removing their advantage of speed) or publishing the same 
information many times (with the same list of synonyms for pleasant repeated with 
slight variations under the headwords enjoyable, congenial, and so on). In other words, 
they require a good deal of compromise in balancing the convenience of the alphabet 
against what we might describe as thematic creep back towards the notional structure 
of the thesaurus. 
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22.3.3, Learner’s Thesauruses 


Thesauruses for learners are popular in two main areas. 

The first is where language-teaching materials use a thesaurus structure as an intui- 
tive way of learning vocabulary in semantic groups; large numbers of language-learning 
textbooks use this system, usually with visual accompaniments for younger readers. In 
this area, it can be difficult to distinguish the somewhat fuzzy boundary between vocab- 
ulary textbooks semantically arranged, which are rarely called thesauruses by their 
compilers or users, and the thesaurus proper. 

‘Learner's thesaurus’ also describes an adult-learner-focused thesaurus, a stand- 
ard thesaurus of one of the types above, annotated in learner's dictionary style with 
more metalinguistic information and usage examples than a traditional thesaurus. An 
early example is McArthur’s (1981) Longman Lexicon of Contemporary English, which 
includes definitions, citations, style labels, illustrations, and grammatical informa- 
tion within its semantic themes. More recently, the 2008 Oxford Learner’s Thesaurus 
provides synonyms and antonyms within an alphabetical list of headwords, alongside 
usage notes, pronunciations, collocation lists, disambiguation information, and dia- 
grams of scalar synonyms, al! with the aim of assisting a learner of English. This work 
can be seen as continuing the innovative tradition of learner’s dictionaries in the latter 
half of the twentieth century. 


22.3.4 Domain Thesauruses 


The market also offers a handful of technical domain-focused thesauruses on subjects 
like Art and Architecture, but even these are often synonym dictionaries in disguise. 
Being limited to a single domain, where meanings are less controversial, expert taxono- 
mies are often available, and polysemy is rarely a problem, they are relatively easy to 
compile and are often mainly of interest to the expert. 


22.4 THE FUNCTION OF A THESAURUS 
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A wide variety of uses can be found for a thesaurus, in any of the three forms outlined 
above (distinctive-semantic, cumulative-semantic, and synonym dictionaries). A the- 
saurus can first act as what Hartmann and James call an active dictionary, one which is 
‘designed to help with encoding tasks, such as the production of a text’ (Dictionary of 
Lexicography 1998: 3). This is perhaps the commonest use of a thesaurus, assisting writers 
in finding either an alternative term to one already used, or a more fitting word for a con- 
cept than the one which immediately springs to mind. Similarly, historical thesauruses 
can be used to give an air of authenticity to works of historical fiction, amongst others. 
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Within thesaurus categories, one can also find the range of expressions available to 
a speaker to encode a concept, and thereby see those competing terms that a speaker 
chose not to use, thus enabling a literary scholar or a political historian, for example, 
to discuss word choice by a particular speaker and analyse the subtleties of picking one 
expression over its competitors. 

Similarly, in a comprehensive historical thesaurus, insights can be offered into both 
language and society. Often the size of a category and its level of detail will indicate the 
importance of an artefact or concept. TOE, for example, is lexically rich in many aspects 
of warfare, such as weapons and warriors, showing their role in the literature of the time, 
or at least in what has come down to us. The chronologically ordered HTOED can be 
used to pinpoint areas of lexical decline and growth, for example in technology, food- 
stuffs, and leisure activities in the modern period. It can also show relationships among 
words of similar meaning, as when substantial numbers of words of Old English ori- 
gin were displaced by French words after 1066 in domains such as Law and Religion. 
Sometimes semantic ordering can reveal connexions between words when there are 
long gaps in the record: Old English becca ‘a pick or mattock’ can be linked to OED’s beck 
in the same meaning, not recorded again until 1875. 

Those writing dictionary definitions can use a thesaurus to locate a word among its 
synonyms, which may have changed radically over history. Thus Welsh English gambo 
originally referred to a low, flat cart, but was subsequently extended to other rudimen- 
tary or makeshift forms of transport, including an old, dilapidated motor-car, which 
links it to the HTOED category listing such vehicles. The fact that the category is already 
well-established may justify a mention of this meaning in the definition of gambo or 
even recognition of a separate sense.* Wordlists may also throw light on sound sym- 
bolism. Examining the HTOED category for Harsh, discordant sounds, and the abstract 
category of Complaint, provides fairly convincing evidence for the link between the /gr/ 
cluster and these concepts (Kay 2012b). 

Much can also be revealed by an in-depth study ofa particular field. Wild, for exam- 
ple, traces the development of terminology for ‘young person, ‘child, and ‘baby/infant’ 
and notes how age is increasingly used as a classifier, as in toddler, pre-schooler, and teen- 
ager, suggesting ‘the increased attention paid to children as a section of society’ (Wild 
2010: 298). Alexander and Struan (2013) discuss the HTOED’s Civilization category, 
suggesting five separate metaphorical conceptions of those people considered ‘uncivi- 
lized’ throughout the history of English (namely the categories of wildness, crudeness, 
foreign-speaking barbarity, incivility, and the state of being significantly Other). 

A thesaurus also encodes the world views of its compilers. Roget's attitude towards 
women and sex is notoriously encapsulated in his thesaurus structure and categories, 
both in how he categorizes parts of the body and sexual activity, and in what he omits. 
Changes to Roget map how attitudes differed in the later twentieth century. 


> We are indebted to Andrew Ball of the OED for this example. 
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Finally, a thesaurus created on the basis of a complete and fixed corpus, such as TOE, 
contains information about the corpus itself—the paucity of words in this thesaurus for 
terms of endearment is not a reflection on Anglo-Saxon lovers, but on what writings 
survive. 


22.5 CREATING A THESAURUS 


Thesaurus creation differs relatively little from the creation of dictionaries: some raw 
data, either a corpus, a collection of citations, or another dictionary, are analysed in 
order to collect lexical facts, the most important of which for a thesaurus are the word 
form and its meaning (often the only pieces of information included in a thesaurus). The 
main difference is in the arrangement stage, which in a dictionary is easily done using 
the alphabet, but in a thesaurus often constitutes most of the work. If a macrostruc- 
ture is already in place, then the body of lexical items involved is split into the major 
classes, and the entirety of such classes is analysed in turn. The most commonly-used 
system here is a simple one, involving arranging the large bulk of these words, often 
in the time-honoured lexicographical paper-slip format, into large groups, then tak- 
ing each group in turn and creating smaller groups, then taking these smaller groups 
and categorizing them yet more finely, and so on until the desired level of granular- 
ity is met. In practice, there is often a lot of cross-reference and cross-pollination of 
word senses across these working categories. Decisions are necessarily pragmatic and 
focused on the data, which can make following a particular theoretical orientation quite 
challenging—although it is a notable benefit of such work that it can provide empirical 
data to feed into linguistic theories, such as prototype theory in cognitive semantics. 

A thesaurus can also be created by adding semantic tags to an existing lexicographical 
database, as was done with the Scots Thesaurus, but in practice this requires a very par- 
ticular set of skills to be done accurately out of semantic order, and adds another large 
field of practice to the work of hard-pressed lexicographers, particularly if it is done as a 
dictionary is being compiled. 


22.6 A BRIEF HISTORY OF THESAURUSES 


ener rena nT pOr nr t Tarren err rere rr rr rarer rect re trrererirrrrererrit init ritetl trier Tinie rir ci ee 


Although alphabetically arranged dictionaries now dominate the market, thematically 
organized thesauruses have a much longer history in the annals of lexicography.* Hiillen 
(2009a: 27-8) notes that: ‘In classical antiquity, and even in the older Chinese, Sanskrit, 
and Arabic cultures, dictionary-making began with the compilation of lists, for which 


4 This section draws heavily on Hiillen (2999), to which the reader is referred for a much more 
detailed account of the early history of thesauruses. 
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words were selected according to semantic principles. These lists might comprise terms 
for domains such as animals, plants, or family relationships, and were often intended as 
aids to understanding older texts, such as the Homeric epics. Their purpose was thus 
didactic, and as they developed they assumed the further purpose of imparting infor- 
mation about the world as well as the terminology needed to discuss it. 

Latin texts with marginal or interlinear glosses in Old English (OE) survive from the 
eighth century AD onwards. Over time, these glosses were collected into wordlists, at 
first related to particular texts, then gathered into independent lists, sometimes alpha- 
betical but often in thematic order. Their primary purpose was the teaching of Latin. 
Favourite topics included the body and its parts, precious stones, medicinal herbs, and 
natural kinds such as animals, birds, fish, and plants. During the Middle English period 
(1150-1500 AD), increasing attention was paid to social domains such as the church, 
arts and crafts, and the home. In the fifteenth century, materials for learning vernacular 
European languages began to appear, stimulated by social changes such as the introduc- 
tion of printing, and increased literacy and mobility among the population. These often 
consisted of multilingual thematic lists, with words from up to eight languages appear- 
ing in parallel. 


22.6.1 Organizing the World 


Stimulated by scientific discovery, interest in classification gained momentum during 
the seventeenth century, while increased contact with other languages led to a fascina- 
tion with the idea of a universal language which might be understood by everyone. An 
early manifestation of these interests was Wilkins’ An Essay towards a Real Character, 
and a Philosophical Language, published in 1668. This enterprise is based on the assump- 
tion that all people perceive the world in the same way, and will therefore be able to 
communicate common concepts to each other through a set of universal symbols which 
transcend the limitations of actual languages, thereby returning humankind to hap- 
pier prelapsarian times. There is no space here to do justice to Wilkins’ system,° which 
presents his universal notions in a structured taxonomy leading from broad general 
classes such as ‘vegetative species’ to groups of synonyms for individual concepts such as 
‘spending’ and ‘keeping: Suffice it to say that, although he aimed beyond a thesaurus of 
English, Wilkins’ work had a profound effect on the subsequent development of thesau- 
ruses, and especially on the work of Roget. 

Between Wilkins’ work and Roget's, there was very little interest in thesauruses. 
Attention was rather focused on ever larger and more sophisticated alphabetical dic- 
tionaries. Such interest as there was in a quasi-onomasiological approach manifested 
itself in alphabetical dictionaries of synonyms, which often had the purpose of improv- 
ing stylistic choice through discussion of the nuances of semantically close words. 


> A full description and analysis is given in Hiillen (1999: ch. 8). 
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22.6.2 Roget's Thesaurus of English Words and Phrases 


Roget's Thesaurus must rank as one of the publishing successes of all time. By the time 
of Kendall’s biography in 2008, an estimated 40 million copies had been sold (Kendall 
2008: 1). This figure includes the six editions published by Longman between 1852 and 
2002, and imprints from other publishers. 

Peter Mark Roget (1779-1869) had a distinguished career as a doctor and scientist before 
returning after retirement in 1842 to his earlier interest in wordlists. In the Preface to the 
first edition, he reports that ‘I had, in the year 1805, completed a classed catalogue of words 
on a small scale, but on the same principle, and nearly the same form, as the Thesaurus 
now published’ (2002: xix). He goes on to say that he had found such lists ‘of much use to 
me in literary composition, thus asserting from the outset the main reason for his works 
popularity: its usefulness as a resource for those searching for an appropriate word. 

In the Introduction which follows the Preface, Roget explains how he had arrived 
at a ‘system of classification of the ideas which are expressible by language... arrang- 
ing them under such classes and categories as reflection and experience had taught 
me would conduct the inquirer most readily and quickly to the object of his search’ 
(2002: xxiii). This results in six primary classes: 1. Abstract relations, including 
Existence, Quantity, Order, Number, Time; 2. Space, including Dimensions, Form, and 
Motion; 3. Matter, its Properties, and Perception through the five senses; 4. Intellect; 
5. Volition; 6. Emotion, Religion, and Morality. Like subsequent editors, Roget was 
realist enough to add an alphabetical index. 

Within the overall structure, the 1,000 categories are subdivided by part of speech, usu- 
ally with further subdivisions on semantic grounds. Each division or subdivision has a 
headword of general meaning, followed by lists of what may, by a very generous definition, 
be regarded as synonyms, but may also include hyponyms, meronyms, and members of the 
same lexical field (on these and other sense relations see Murphy, this volume, especially 
Sections 27.3 and 27.4). As far as possible, the following category contains words of opposite 
meaning (‘correlative terms —the term antonym was not yet in use), although the practice 
of laying out the thesaurus in parallel antonymic columns was abandoned from the 1962 
edition. Roget shows himself well aware of some of the problems users and critics may have 
with his work. He acknowledges the impossibility of completely substitutable synonyms, 
and, no doubt with experience of synonym dictionaries in mind, the equal impossibility 
of investigating all the ‘distinctions to be drawn between words apparently synonymous, 
intending instead to ‘classify and arrange them according to the sense in which they are now 
used, and which I presume to be already known to the reader’ (2002: xxvii). 


22.7 AFTER ROGET 


One sure sign that a product has arrived is when a trade or personal name achieves the 
status of a common noun, as in ‘Have you got my Roget?’ In the HTOED Lexicography 
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category, only two people are accorded this accolade: Roget and Webster. Recognition 
of the merit of Roget's work was not, however, instantaneous. Emblen has some inter- 
esting examples of early reviews, and writes: ‘Most journals and papers that reviewed 
the Thesaurus were reservedly complimentary and somewhat bewildered as to how 
one would use the thing’ (Emblen 1970: 272). Nevertheless, the popularity of the book 
grew, and after the crossword puzzle boom hit North America and Britain in the 1920s, 
it became an indispensable part of any library (Emblen 1970: 278, 281). 

American editions of Roget appeared from 1854, with Thomas Y. Crowell and 
Company taking over as publisher in 1886 and subsequently producing new editions 
under the title of Roget’s International Thesaurus (Emblen 1970: 282). There was some 
tweaking of Roget’s scheme of classification, for example in Chapman's classification 
into fifteen main categories in the fifth edition, on the grounds that Roget’s scheme 
‘does not coincide with the way most people now apprehend the universe’ (Roget’s 
International Thesaurus 1992; quoted in Fischer 2004: 43; see also Hiillen 2009a: 44). 
A French edition appeared in 1859 with Roget’s approval (Hiillen 2009b: 76; Kendall 
2008: 266), and there have been versions in German and other European languages 
(Hillen 2009b: 60). 

From time to time, brave souls make a break for freedom and offer alternatives to 
Roget's structure, often by choosing a different starting point and reorganizing the major 
classes. Two such were Franz Dornseiff’s Der deutsche Wortschatz nach Sachgruppen 
(1933), and Rudolph Hallig and Walther von Wartburg’s Begriffssystem als Grundlage fiir 
die Lexikographie (Hallig and von Wartburg 1952). Dornseiff’s classification of German 
has twenty major classes, beginning with the Inorganic World, followed by Plants, 
Animals, and Humans, while Hallig and von Wartburg have ten classes in three broad 
groups: The Universe, Man, and Man and the Universe (Fischer 2004). It should be 
noted that theirs is a classification of concepts (in French) rather than a classification 
of the lexicon of a language as in Dornseiff or Roget; theoretically, it could be used to 
display the lexicon of any language. According to Ullmann, the scheme caused a good 
deal of interest when it was presented at the Seventh International Congress of Linguists 
in 1952, being seen mainly as a framework for displaying and comparing different lan- 
guages or historical periods (1957: 314-15; see also Hiillen 1999: 18-21). There is no record 
of its being used in its totality or of having much effect on actual thesaurus-making. 

A more practical approach is taken in McArthur’s Longman Lexicon (1981), which 
is organized under fourteen major classes (themes), beginning with Life and Living 
Things. This work, which later influenced the database of the UCREL Semantic 
Annotation System (USAS), was particularly designed for foreign learners, given the 
logical use of a thesaurus structure for vocabulary learning, and began the later trend 
for corpus-driven and usage-sensitive learner’s thesauruses. A successor of USAS is the 
SAMUELS tagging system, which uses techniques from natural language processing to 
annotate words in text with their disambiguated meanings using the hierarchy reference 
codes from HTOED. This system, which allows the use of semantically oriented corpus 
linguistics in the analysis of text collections, is only possible because of the relational 
knowledge networks encoded within the thesaurus macrostructure of the HTOED. 
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Later modern thesauruses which break away from Roget are the historical thesauruses 
TOE and HTOED, described in Section 22.3.1 above. These originate primarily from the 
work of Professor Michael Samuels, who set up the Historical Thesaurus of English pro- 
ject at the University of Glasgow in the 1960s. 


22.8 THE Way AHEAD 
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No lexicographical field is immune to the disruptive effects of new technology, and this 
is perhaps most true in the field of thesauruses. The distinction made in this chapter 
between a synonym dictionary and a thematic-semantic thesaurus is one which breaks 
down immediately on entering the electronic arena. A synonym dictionary, which 
reproduces under given headwords a subset of terms in a semantic thesaurus'’s micro- 
structure, is only necessary in a printed form; an electronic equivalent would simply be 
a thesaurus database which dynamically provides a semantic field based on the original 
search word. 

Extending this line of thought, in a time of instant database search results, the sole 
advantage of alphabetical order is entirely lost, and its disadvantages dominate; it places 
unrelated items next to each other, it loses the opportunity to make connections useful 
to a user, and its layout prohibits easy browsing on one particular subject. Dictionary 
data, laid out in a thesaurus structure, becomes the most attractive hybrid, rather than 
the converse as now. When scholarship and reference works in the digital age require 
not isolation and fragmentation, but rather union and seamless integration to best serve 
a user’s needs, the strictures of the alphabet become hindrances rather than advantages, 
and relics of a printed form which is in decline. Without too much hyperbole, it can 
easily be predicted that the thesaurus, the oldest form of lexicography and one domi- 
nant for centuries before the printing revolution, could again become dominant in the 
post-print digital future. 


CHAPTER 23 
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23.1 THE LEXICOGRAPHICAL NEEDS 
OF THE DIALECTOLOGIST 


Any consideration of the involvement of a linguist with material which might be 
thought of as dialectal must of necessity concern variation: there can be no dialectal 
item which does not stand in contrast to others, set off from them not only by funda- 
mental considerations of semantics but also by differences of geographical and/or social 
distribution, which also of course contribute to meaning in a wider sense. That is, whilst 
there exist dialect synonyms which at a superficial level mean the same thing, dialec- 
tologists have as their task the probing of the meaning of the differences which exist 
in these so-called variants for any particular concept, or variable, in terms of location 
in place or social group. Their aim in doing this is to shed light on such matters as the 
mechanisms by which language spreads and changes over time. 

It follows from this need of the dialectologist to focus on variability that all instances 
of variation in their data are, or at least might be, significant. It is well known that lexi- 
cographers have decisions to make about the degree to which they are to combine senses 
or to separate one from another, to what extent they are to be lumpers or splitters in the 
treatment of their data. For the mainstream dictionary maker, Hanks (2000: 208) rea- 
sonably sees these decisions as being determined by marketing considerations, and 
these are certainly an issue for the dialect lexicographer as for any other. But with focus 
needed on any and every instance of lexical difference, the imperative on dialect lexi- 
cographers is not a mainstream one. If they are to do justice to their material, and so to 
the users of their dictionary product, they are called upon to be the ultimate splitters. 
All their data are in variation and so must be contrasted, whether in the form of separate 
dictionary entries or within the detail of those entries. It is axiomatic that this variation 
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is not ‘free, a speaker being assumed to be adopting linguistic forms with any degree 
of whimsy, taking them up and dropping them in a random fashion. Rather, linguistic 
choices are regarded by the variationist as essentially significant in some regard, regional 
or social or both, and consequently important to record for study in the cause of achiev- 
ing a better understanding of the courses, motivations, and mechanisms of language 
change. ‘There is orderly, or structured, heterogeneity (Weinreich et al. 1968: 99-100) in 
data which can hold the key to what is taking place in the language of the communities 
studied. It follows that, to the extent that practicalities and resources allow, a linguist 
creating a dictionary record of non-standard lexis for the purposes of proper dialec- 
tal examination must consider for inclusion everything they can accumulate, and omit 
nothing, however inconsequential it might at first appear or however irrelevant it might 
be for their own immediate purposes. ‘ 


23.2 Two BENCHMARK DIALECT 
DICTIONARIES 


A classic dictionary of the English dialects against which others can usefully be com- 
pared is the English Dialect Dictionary (EDD: Wright 1898-1905). This six-volume work 
remains a major source of information for anyone wanting a proper understanding of 
English, especially but not only British English, non-standard lexis. 

EDD was the culmination of the efforts of an army of amateur and professional schol- 
ars, many of whom were associated with the English Dialect Society which was specially 
founded for its creation. The fact that, on handing over the last of its findings to its edi- 
tor, the Society considered its work done and promptly disbanded might seem remark- 
able today in light of the information on dialectal lexis which has subsequently come to 
light. It is, nevertheless, testimony to the thoroughness with which the existing record 
had been trawled, and with which the information had been presented. Moreover, in its 
design and execution, the EDD showed the way in which the non-standard, problematic 
in a number of ways for lexicologists in general and lexicographers in particular, might 
usefully be approached. And the dictionary’s treatment of its lexicon demonstrates well 
issues with which the lexicographer of the non-standard is faced, and pitfalls which 
must be avoided if the subject is to be properly treated. 

At first sight EDD is conventional in design. Indeed, it has much in common with the 
Oxford English Dictionary (OED), for which it was conceived to act as the non-standard 
counterpart (Penhallurick 2009: 301-2). The headwords, alphabetically arranged, are 
explored in detail in a plethora of hot-metal fonts and font sizes, in a system which 
might at first seem somewhat overwhelming to the modern reader but which, once 
mastered, allows identification of an entry’s constituent parts. These parts in outline 
typically comprise: headword; word class; overall geographical distribution; definition; 
geographically-located citations. Etymologies are cautiously ventured upon, good sense 
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being shown in avoiding the fanciful conjecture which can sometimes accompany the 
discussion of the non-standard. Pre-IPA phonetic transcription is used infrequently. 
A system of abbreviations directs the user to sources and to informants: the system of 
initials used to identify correspondents and compilers of word lists is not immediately 
transparent but, once the connection between abbreviations and prefatory lists has been 
mastered, the successful and necessary economy achieved by the device is understand- 
able. The whole amounts to an invaluable resource, and a testimony to dialect scholar- 
ship creditable in any age: Penhallurick (2009: 301-6) provides a comprehensive and 
insightful overview of the work, from its inception to its ultimate achievement. 

One unavoidable but quite serious shortcoming reduces the usefulness of the printed 
EDD, however. A dialect dictionary acts as a catalogue of variants of a variable, the vari- 
able being a particular notion or concept which is identified by the definition. Lacking 
any system of cross-referencing, the simple presentation of headwords alphabetically 
leaves each lexical item marooned, separated from its semantic stable-mates. It is of 
course hard to envisage a way in which this deficiency might successfully have been 
overcome in a paper-bound work already of considerable size. Cross-referring each 
headword, and each sub-entry under a headword, to all others to which it was con- 
nected by meaning would have been quite impracticable, and a coding system to effect 
the connection would have made still more complex an already complicated text. And 
even had space allowed, such an operation would have been beyond the reach of the 
dictionary team’s resources. It has therefore been left until the computer revolution of 
recent years for this lack to be corrected. The Spoken Early English Dialect (SPEED) 
project, led by Manfred Markus at the University of Innsbruck (<http://www.uibk.ac.at/ 
anglistik/projects/speed/startseite_edd_online.html>; Markus and Heuberger 2007; 
Onysko et al. 2009; Markus 2010; Praxmarer 2010), has succeeded in capturing and pre- 
senting EDD electronically, in so doing providing the user access on screen to the con- 
tents of six bulky volumes, and allowing them to cross-refer by meaning and to perform 
complex searches not only on chosen variables but on distributions and sources (see for 
example Ruano-Garcia 2012). With SPEED, EDD can rightly be said to have come of age 
as an invaluable lexicographical tool in modern English dialectology. 

The equivalent of EDD in the United States is the Dictionary of American Regional 
English (DARE: <http://www.dare.wisc.edu>). Although with roots in a late 
nineteenth-century drive to emulate Wright's work with EDD, faced with the enormous 
size of the task of collecting lexis from across the United States, DARE did not publish 
its first volume until 1985, by which time computerization allowed the kinds of sophis- 
tication in data handling and presentation which even SPEED cannot approach, given 
the restrictions imposed by EDD. There are six printed volumes (DARE 1985-2013) and 
an electronic web-based version which enables subscribers to undertake SPEED-like 
searches, Unlike EDD, DARE was underpinned by a concerted period of expert field- 
work, 1965-70. This permitted the inclusion of social profiling of informants, and the 
creation of computer-generated distribution maps from the start of publication, and 
to these the website will add sound recordings as well as the electronic search facility. 
The typical DARE entry thus contains the same information as EDD as regards forms, 
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variants, distributions and sources (including a good deal of cryptic abbreviation), but 
with its late start advantageously permitting now, and promising for the future, a good 
deal of linguistic and sociolinguistic sophistication. 

We might justifiably take EDD and DARE as prime examples of a lexicographical 
genre as it has evolved over recent time, which bring to light the structures which a user 
might reasonably and realistically expect. Nevertheless, their study confronts us with 
still more fundamental issues concerning the whole enterprise that is dialect lexicog- 
raphy. Immediately, we must decide what is to be understood by the term ‘dialect. And 
even if this has been decided satisfactorily, in many cases it is not easy to identify the 
dialect word. 


23.3 WHAT QUALIFIES AS ‘DIALECT’? 


Setting out on any lexicological undertaking involving dialect, we are immediately con- 
fronted with an issue of definition. Before beginning to present those items which we 
have classified as being dialectal, we must first define and justify what qualifies as the 
target. Just what is dialect’? The term of course has a readily assumed, if simplistic, defi- 
nition relating to local or regional vernacular speech. This is undoubtedly the popular 
default interpretation of what dialect is, and all dialectologists would undoubtedly con- 
cur with the view that language associated with particular places qualifies. Most would 
now not stop here, however. Definitions of ‘dialect’ abound, but an authoritative mod- 
ern one is provided by Raven McDavid: 


a dialect simply defined as a variety of a language, generally mutually intelligible with 
other varieties of that language, but set off from them bya unique complex of features 
of pronunciation, grammar and vocabulary. Dialect, thus used, is not a derogatory 
term but a descriptive one, (McDavid, quoted in Kretzschmar 1979: 159) 


The view here is set wide. To limit oneself solely to the notion that dialectal variation 
is only geographical is to cut oneself off from developments in the field of dialectology 
which have taken place most notably since the middle of the twentieth century, devel- 
opments which have seen study of social variability become as important a preoccupa- 
tion of the variationist linguist as is that of regional variability, and which see Standard 
English ranked as a dialect, albeit “an unusual dialect in a number of ways” (Trudgill 
1999: 123), alongside non-standard varieties however identified. All kinds of linguistic 
variation are then ultimately the province of the dialect specialist. To the premodifiers 
‘local’ and ‘regional’ must be added others such as ‘standard’ and ‘non-standard, ‘tra- 
ditional and ‘modern; and especially in relation to this last ‘social, carrying the strong 
implication that dialectal variation is by speaker type as well as by speaker place. A fur- 
ther premodifier, ‘occupational, confirms still further the overlapping of dialectology 
into the realm of sociolinguistic variation more generally. (It is instructive that Wakelin, 
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while choosing essentially to restrict dialect to the regional for the purpose of a semi- 
nal essay on the treatment of dialect in (general) English dictionaries, admits that the 
regional-social (and occupational) distinctions are ‘impossible to maintain in practice’ 
(Wakelin 1987: 156)). Lexicography must be an essentially practical pursuit, and so must 
eschew artificial distinctions, however well entrenched they might be. 

The association of dialect with differences in language use between various social 
groups, not just across communities but within one community, and the consequent 
identification of both regional and social dialects, has been driven increasingly in recent 
years by a focus not only on the facts of variation but on the interactional mechanisms 
by which linguistic distinctions are created, maintained, and changed. But any thought 
that the meaning attached to the dialect’ label has undergone radical change of late is 
mistaken, OED records parallel regional and social meanings for ‘dialect’ from the last 
quarter of the sixteenth century onwards, involving both ‘subordinate forms or vari- 
eties of a language arising from local peculiarities of vocabulary, pronunciation, and 
idiom (i.e. the regional) and ‘speech peculiar to, or characteristic of, a particular person 
or class’ (the social). For more than three hundred years, therefore, ‘dialect’ has carried 
no necessary implication of geographical restriction, though it must be acknowledged 
that it is the second meaning which has become the dominant one. No doubt in part as a 
result of this, lexis associated with particular social groups has acquired a range of labels 
other than ‘dialect, foremost among which is of course slang. ‘Slang’ is not recorded by 
the OED as in use before the mid-eighteenth century, some 150 years after the appear- 
ance of ‘dialect’ with both its meanings. At first it had only pejorative overtones (‘lan- 
guage of a low or vulgar type’), seemingly acquiring less marked social connotations 
relating to specialized vocabularies in the early nineteenth century. It is pre- dated in this 
later sense by ‘cant, though even this is not recorded until the later seventeenth century, 
while ‘jargon, always seemingly ‘applied contemptuously’ to non-standard or special- 
ized speech according to OED, dates from the mid-seventeenth century. 

‘Dialect’ thus held the stage as the term to be applied to the non-standard lexical ele- 
ment of language in all its variety for approaching a century before other labels began to 
encroach on its territory. Wakelin (1987: 156) suggests that early lexicographers were dis- 
inclined or unable to differentiate between kinds of ‘unprestigious language, ‘country’ 
or ‘vulgar’ It did, quite apparently, ultimately come to seem necessary for other terms 
to be employed, especially contemptuously, to refer to the words used by marginalized 
social groups across society and, as significantly, by those in the professions. Wright 
clearly had a notion of slang that was not dialect, as is evident from those relatively few 
EDD compiling slips which have survived in the archives of Oxford University Press. 
Picking up ona reference to duffer with the meaning ‘a man who sold goods under false 
pretences’ from Mayhew’s (1851) London Labour and the London Poor, for example, he 
labels it “London Slang’ and omits it from his dictionary. He similarly discounts from the 
dialectal record Australian duff, found in Rolf Bolderwood’s novel Robbery Under Arms 
meaning ‘to steal, as being ‘colonial and slang. This notion of items being both regional 
and slang is noteworthy, suggestive of an uncertainty as to precisely what the two desig- 
nations precisely convey. But clearly, on the evidence of OED dialect’ never completely 
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lost its social meaning, and was never entirely reserved for the regionally restricted lexis 
to which it is now commonly held to refer. With the advent of social, or sociolinguistic, 
dialectology in the twentieth century it can be seen to be assuming its pre-eminent place 
once more as an over-arching label for all lexis that is non-standard. 

It follows that neither in the past nor especially in the present, when the academic 
study of dialect has turned its focus firmly in the direction of socially determined dis- 
tinctions in speech within communities, have distinctions suggested by such style labels 
as ‘dialect’ and ‘slang’ been sufficient to disqualify a lexical item from the consideration of 
the dialectologist. That is not to say that another specialist with non-dialectal focus may 
not legitimately choose to concentrate on what they identify as ‘slang, however defined. 
It does, however, require dialectologists to be ready to consider all non-standard lexis, 
and for them to remain blind to rigid distinctions which might be imposed at the level 
of social use. Subscribers to the database of the BBC Voices project recorded many more 
instances of up the duff for ‘pregnant’ than any other variant (Upton 2013), undoubt- 
edly justifying an entry for duff in this sense in any treatment of the lexical record of 
the project: to attempt any separation of terms according to artificially imposed labels 
would seriously damage the record, And the dialectologist is to range still further than 
the simple dialect-slang dichotomy: the use of the merely colloquial can be of social and 
regional significance too, as can those stylistic distinctions within the Standard noted 
for example in the U and non-U observations of Ross (1954). The remit of the dialect 
lexicologist, and therefore of the dialect lexicographer, is wide and multi-faceted. 


23.4 WHATIS A DIALECT WORD? 


As pointed out above, what qualifies words as dialectal is that they take part in 
systematic variation: there must be distinctions involved in their use which can be 
located in the separation of the users, whether by geographical location or by social 
grouping or purpose. Some dialectal variation is of course most obvious, this fre- 
quently signalled by distinctive regional distribution. If the item is supported by an 
underlying etymological distinction and has a corresponding historical settlement 
pattern attached, so much the better. A bird’s beak is variously a beak (from Old 
French), a bill (Old English), or a neb (Old Norse), with the latter especially now tend- 
ing to be restricted to older speakers firmly rooted in certain parts of northern and 
eastern England once settled by Vikings. Beak is the predominant item in England, 
exhibiting a distributional pattern which strongly suggests its expansion over time 
(Upton and Widdowson 2006: 140-1). The pattern implies the encroachment of beak 
onan earlier solid and well-established bill area, and a consequent fragmenting of this 
area into smaller isolated enclaves especially in regions removed from the French- 
influenced heartland around the capital. (Of course, at the level of standard dialect a 
virtue has been made of this OE/OF resource to provide semantic distinctions in beak 
shape too.) Equally straightforwardly we can identify two words for ‘to play a game, 
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play and Iaik (or lake). Like, neb, the latter takes part in both regional and social vari- 
ation, being found today especially in Yorkshire, where it is used most especially by 
speakers embedded in strongly local and traditional culture, and to some extent with 
older rather than younger speakers; play, as well as being the standard dialect word, is 
of course the local word throughout most of the English-speaking world. 

Such contrasting items, distinct in provenance, distribution, and shape, readily 
qualify as dialectal and so as dictionary headwords. What, though, of bridge and brig 
to signify a bridge? It is apparent that the latter variant has a geographical heartland in 
parts of northern England and Scotland (Upton and Widdowson 2006: 154-5). But the 
distinction might initially be taken to be merely phonological, and so not to qualify 
as warranting separate headword status in a dictionary. Separate Old English and Old 
Norse derivations, however, clearly require that each is taken to be lexically distinct. 
This is of course a comparatively straightforward etymological example, unlikely 
to prove problematic to the linguistically competent. But other pairings abound to 
confound the unwary. Easier to overlook as a lexical split is the grammatical sepa- 
ration inherent in such forms as him and ‘un for the him-pronoun. In rapid speech 
especially, the latter, still to be heard in the English West Country, might pass for the 
former, especially in unstressed position: it is only when its root in Old English mas- 
culine singular pronoun hine is appreciated that the significant distinction comes to 
be appreciated and the social as well as regional importance of the distinction being 
held is established. It follows that even the synchronically-focused dialect lexicogra- 
pher must be language-history aware in order to identify the words that are their tar- 
get, and to give them appropriate written shape when they have done so. 


23.5 WRITING SPOKEN WORDS 


However, the written shape of the identified dialect word, so essential for its cast- 
ing in dictionary form, is in itself frequently problematic. Story et al. (Dictionary of 
Newfoundland English 1982) can unproblematically identify maidenhair tea and moun- 
taineer tea as requiring separate treatment, while suggesting the latter as representing 
a probable (pronunciation) distortion of the former. But each of these items has a well- 
recognized modern English orthographic shape, and in the non-standard realm this is 
of course not always the case. How, for example, is the modern reflex of Old English 
hine now to be represented? Whatever the form in which it comes to the lexicographer, 
it must always be remembered that non-standard dialectal evidence is essentially oral. 
The available evidence might be a faithful representation of a pronunciation, recorded 
in phonetic script by a professional fieldworker. More likely, it will have been written 
using ordinary orthography by the user, or by an amateur collector. In the professional’s 
case there will be a series of sounds recorded, their accretion into obvious written forms 
being a secondary consideration to the collecting of the hard linguistic evidence. In the 
lay-person’s case, in a manner echoing the situation pertaining in the Middle English 
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period where no standard English spelling system was universally acknowledged, there 
will be an attempt to reproduce the spoken word according to what is understood of 
modern spelling rules. In either case the lexicographer can expect to encounter difficul- 
ties when constructing headwords for the record. 

EDD provides ample straightforward examples of this complication in the lexicogra- 
pher’s art. There are, of course, very many different orthographic renderings recorded 
there of Standard English words with non-standard meanings, entered in order to give 
an indication of likely pronunciations: headword gate, for example, is given the forms 
<gaayte>, <gheeat>, <yat>, amongst others, a long list of spelt forms in this case fol- 
lowing a short list of phonetic transcriptions. The form in which the entry headword 
is to be cast is, however, in no doubt. Immediately preceding this entry, though, the 
headword gatches is said to be ‘[a]lso written gatchers, and apparently either form could 
have been chosen as the headword, this without creating a problem for location since 
gatchers would immediately precede gatches alphabetically. A rather more problematic 
case is the entry for gobblet referring to a large, cast-iron pan, which carries the infor- 
mation that it is ‘[a]lso written goblet’: such a headword, were it to exist, would come 
seven places further on in the EDD alphabetic listing, and there is no reason to suppose 
that the searcher under the (Standard dialect) headword goblet would necessarily find 
gobblet too. And even were the dictionary user perhaps to find the gobblet entry while 
searching for the non-existent goblet, there are still more intractable problems of this 
kind. One example of this is sufficient to make the point. An entry under the headword 
corts, a Somerset rendering of carrots, states that it is ‘[a]lso written karts. However, just 
as there is no gatchers or goblet entry, there is no karts headword, leaving the dictionary 
user who searches only under that form to conclude that the dictionary does not con- 
tain it anywhere. It would be unfair to fault EDD overmuch on this matter, since there 
are actually many dummy headwords to be found of the kind “BYLEDDY, see Byrlady, 
‘FOURMART, FOURNER, see Foumart, Furner’ but it is nevertheless the case that, 
when using this and many other dictionaries of non-standard English, it is necessary to 
be inventive in considering how an item has been spelt, followed by energetic hunting 
for it, sometimes to no good effect. 

‘The issue of suitable orthography can be still more difficult than these examples sug- 
gest, going beyond that of finding a suitable spelling in which to reproduce a series of 
sounds in headword form. This is the case when it is clear that speakers are pronounc- 
ing words but it is unclear what those words are among a range of possibilities. A clas- 
sic case of this curious phenomenon emerges from the data of the Survey of English 
Dialects (SED: Orton et al. 1962-71). Selected responses to question IV.3.9 of the Survey 
for the notion ‘ruts, ‘What do you call those lines left behind by the cart-wheels when 
the ground is soft?; were [ka:ttaks], ["ka:'traks], [ka:tuaks], and [ka:tiaeks]. These pro- 
nunciations are obviously ambiguous. And it is quite unrealistic to expect informants 
who offered these responses to deconstruct what they said, to clarify whether the first 
element of the compounds is ‘car’ or ‘cart, and the second ‘tracks’ or ‘racks, both of 
which pairs are feasible from the existing dialectal record. Local usage as regards the 
simplex forms and the evidence provided by recorded stress and the double articulation 
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of consonants offer clues, especially as regards the first two examples. But when compil- 
ing the SED dictionary (Upton et al. 1994) it was necessary to concede the ambiguity, 
providing entries for ‘cart-racks, ‘car-tracks, and ‘cart-tracks’ in which the variants were 
allocated as logically as possible but with notes cross referring each to the others. 

A further dimension to this problem of identifying just what word is being pro- 
nounced is also well exemplified in SED. A response from Devon for the notion ‘donkey’ 
was transcribed by the fieldworker collecting it as [kony:tor:], this subsequently being 
given the orthographic form <cornutor> by SED editors following consultation of the 
OED. Commenting on this, Wakelin (1977) writes: 


Some ...are words which are poorly attested ... and admit of no explanation for the 
present, although one may be found in due course. Examples which seem to fall into 
this class are cornutor ‘donkey, found only as a second response at (SED locality] 
Du, and otherwise utterly unknown... (Wakelin 1977: 67) 


During compilation of the SED dictionary it was, of course, necessary to revisit the 
Survey's phonetic transcriptions, properly to understand the record and to confront 
ambiguities (as with ‘ruts’ above). The editors were prompted by the pronunciation to 
re-write the headword as canuter*, the asterisk signifying divergence from the SED 
record and a note explaining that the item was ‘more probably formed on Canute, 
English king with reputation for obstinacy’ (Upton et al. 1994: 66). We cannot be abso- 
lutely sure about such an interpretation, of course, hence the defining qualification. But 
the implied meaning in what is, for a dialect informant, frequently a series of sounds 
rather than the articulation of a word which has recognized orthographic shape, must 
always be sought and, once found with some degree of certainty, that interpretation will 
determine the shape it is given in its spelling. This will be especially the case with appar- 
ent nonce-words, items coined for the occasion as is likely here (often when offering 
terms to a dialect researcher), or words which are idiolectal or confined to an inform- 
ant’s immediate circle. They must be given written form, of course, and such form, as in 
the case of both cornutor and canuter, is bound to be speculative. Whilst not being too 
dogmatic, the dialect lexicographer must be sufficiently bold as to hazard their opinion 
in such cases, and the user adventurous in tracking an item down. 


23.6 IN THE HANDS OF THE EXPERTS 


While a dialect lexicographer will have (one hopes and expects informed) notions 
of how words are spelt, however, it is likely that their informants will have their own 
ideas on the matter, ideas which must be taken seriously as possibly informing the 
non-standard linguistic record. And considerations of orthographical shape do not 
apply only to obviously non-standard lexical items, so that it is necessary to consider 
how much regard might be given to the form in which clearly standard-dialect items 
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are cast. Elsewhere (Upton 2013: 182-3) this matter is discussed in relation to the word 
skive, as it emerges in data from the BBC Voices project, which saw large-scale inputting 
of lexical data to a website constructed to collect information on lexical variation across 
the United Kingdom between 2005 and 2007. The use of a website for data collection 
of course required respondents to the survey to enter in written form their answers to 
prompts for variants, so that it was necessary for them to be literate (itself a concept 
open to many definitions), As was apparent for skive, a variant of the variable to play 
truant from school, many people are in fact extremely cavalier in the way in which they 
spell quite well-known words: this fact is manifest throughout the large database created 
from the input data, where such spellings as <scieve>, <schive>, <skirve>, and <skehv> 
are to be found. It might be argued, of course, that the complexity created by the pres- 
ence ofa multiplicity of written forms ofa word can be cut through by reducing all to the 
perfectly well-known Standard English spelling. But to do this is to lose the immediacy 
of the proffered data, and, in doing so, upon occasion to lose information which is vital 
for the dialectological record. At the level of pronunciation, for example, such a form 
as <skaive> might point to the existence of a long PRICE vowel! in an informant’s pho- 
nological repertoire, a small point in itself but one which offers to take the researcher 
well beyond the simply lexical when set alongside information on the informant’s geo- 
graphical and social profile and that of respondents from the same and nearby localities. 
The cumulative weight of information which is in a database such as that of BBC Voices 
validates its sometimes curious spellings, giving them much the same force that such 
spellings have in EDD, and whilst for the purposes of alphabetical listing the Standard 
English form might be adopted, others cannot be discounted from the record. The 
orthography in which an item is offered is akin to that phonetic transcription which is to 
be expected from a trained fieldworker, and a dictionary needs to take note of this fact 
and should make it available to the user in some form. (We might here set aside the addi- 
tional issue of the place of spelling in a discussion of literacy, which, though possessing a 
sociolinguistic dimension, is not strictly dialectological: on a lack of clarity on this issue, 
see Upton 2013: 184.) 

It might be thought from the foregoing that oral sources such as those from which 
dialect lexicographers are obliged to create their dictionaries are so problematic as to be 
unreliable, And certainly it is necessary to proceed with caution when using them. It is 
for this reason that, following sound standard-dialect lexicographic practice of the time 
and obliged to work from material not gathered by rigorous fieldwork, Wright required 
printed evidence before he was prepared to admit an item to the EDD, although it should 
be noted that some of this evidence goes back only to his EDS sources, these often com- 
piled by amateurs from oral evidence. Uncertainty over non-standard evidence is war- 
ranted especially when, as with canuter above, only one instance is to be found, printed 
or oral; it is here that large data, such as those emerging from the BBC Voices database 
of some 785,000 responses can induce some confidence in the researcher. Cassidy (1988: 


' Reference here is to the vowel-sound in such words as price, using the system of standard lexical sets 
devised by Wells (1982) 
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327), writing of his experiences as editor of DARE, observes that ‘[p]eople are annoy- 
ingly casual about the application of names, and until we have corroborative informa- 
tion on a large scale our inferences must always be provisional. 

Cassidy (1988: 327-8) does go on to offer us reassurance, however. He is unimpressed 
with the reviewer of DARE in the Times Literary Supplement who places overmuch 
stress on the reliance on written resources by the OED, seeming by doing so to question 
the reliability of that DARE material which is purely oral. He asserts that, quite to the 
contrary, 


localized informants who know the things of their own area and can vouch for them 
may never have heard names used in other areas. For our purposes, this gives a real 
value to ignorance. DARE informants were chosen always as people living at or near 
their birthplaces who had not traveled or stayed away long enough for their local 
speech to have been affected. (Cassidy 1988: 327) 


It is worth emphasizing this ‘real value [of] ignorance. There will undoubtedly be 
amongst a dialectologist’s informants those who will mischievously insert bogus words 
into their asserted repertoire. There will certainly be those too who are confused in 
their offerings. And there will be others who, notwithstanding what Cassidy claims, 
will have been influenced from beyond the group for which they are taken to be sup- 
plying information. Nevertheless, while flagging up any well-founded unease, we have 
no cause summarily to discount material offered in good faith by Cassidy's ‘annoyingly 
casual’ people. DARE’s turkey buzzard, used to refer to two birds of quite different gen- 
era (Cassidy 1988: 326-7) in two quite different areas of North America, is, like pikelet 
that might be offered in place of standard-dialect crumpet by someone from the English 
Midlands, in no sense an error on anyone’ part. It is the task of the compiler of the dia- 
lect dictionary to record what is offered, sifting through it and making observations on 
likely veracity whilst taking care not to override what is real, if seemingly contradictory, 
knowledge. 

And there are occasions on which the informant’s knowledge simply must be taken 
at face value, this especially so when we are involved in the occupational-dialect sphere. 
General lexicographers can of course be expected to have consulted experts in those 
overtly technical areas in which they are not themselves proficient: no one would 
attempt to define the lexicon of nuclear physics without consulting the discipline's scien- 
tific practitioners and their literature. There is a definite danger, however, when attempts 
are made to define what might be termed sub-technical lexis, the lexis which is in eve- 
ryday use but which nevertheless properly has very precise meanings. The noun cleaver 
is readily applied to an implement used to cut through meat and bones, but a butcher 
would typically reserve it for a very large tool used two-handed, using chopper for the 
smaller tool employed in a shop or kitchen. Similarly, although the definition for the 
verb to fillet might be taken casually to mean ‘to remove the bones from a piece of fish 
or meat’ (OALD), technically a butcher would never fillet but would only bone meat, a 
fact recognized in the OED. The dialectal record is crucial here for a proper recording of 
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lexical denotation: where the unwary lexicographer might assume that cleaver-chopper 
or fillet-bone variation is simply incidental, there exists instead systematic variation at 
the sub-technical level as meaningful as any at the most technical. 


23.7 CONCLUSION 
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Regional and dialect dictionaries can readily be seen to have characteristics that mark 
them out from dictionaries of other types. The intention here has been to use a restricted 
set of texts and examples to concentrate attention on those characteristics, then to move 
beyond to detail some of the principal issues confronting the dialect lexicographer, in so 
doing countering a number of misunderstandings about dialect lexis, and about dialect 
more generally. There is certainly a regional element to the dialectologist’s subject, but it 
has never been wholly restricted merely to geographical variability, and especially from 
the second half of the twentieth century onwards its academic study has fully embraced 
the social. This fact in turn raises interesting issues about a ‘dialect’—‘slang’ interface, and 
the consequent use of terminology. However identified, the shape to be given for practi- 
cal purposes to lexical items which exist primarily in the spoken domain is problem- 
atic, but at the same time its confronting and resolution prompt valuable insights into 
a range of linguistic issues as diverse as those concerning pronunciation and literacy. 
Cautious and sensitive handling of the boundless variation which is the data of dialec- 
tology brings rich rewards, from material that might otherwise seem frustrating in its 
internal differences and its divergence from the Standard. 

Practical considerations must perforce place limits on the amount of data which 
dialect lexicographers may present, but they must remain alive to the messages of the 
underlying data, and seek ways to represent them in as much detail as is practicable. 
Their adherence to the principal of orderly heterogeneity requires as much. 


CHAPTER 24 
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HOLGER BECKER 


24.1 INTRODUCTION 


THE aim of this chapter is to introduce the reader to specialized dictionaries. It will give 
an overview of some of the theoretical problems and discuss various macro- and micro- 
structural aspects of such dictionaries. The discussion will include the treatment of sci- 
entific and technical terms in general dictionaries in order to highlight the differences 
between general and scientific dictionaries! 


24.2 SOME BASIC CONCEPTS 


24.2.1 Definition of Specialized and Technical 
Dictionaries 


Specialized and technical dictionaries are usually opposed to general dictionaries. While 
a general dictionary is said to deal with the general vocabulary, a specialized dictionary 


* I wish to thank Jonathan Blaney for proofreading the first version of this chapter. 
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contains terms of one or more special subject fields such as mathematics, economics, 
or electrical engineering. For this reason such dictionaries are also called restricted. 
Further names include special dictionary and language for specific purposes dictionary 
(LSP dictionary). Note that some authors use specialized dictionary in a broader sense, 
in which specialized is not limited to a special subject field but to varieties of a language 
such as dialects or slang. 

The distinction is based on the notions of general language and special language. 
Note, however, that these notions are very difficult to delimit and that this problem can 
be approached from a number of different perspectives. For a discussion, see Becker 
(2005). 


24.2.2 Types and Users 


Specialized dictionaries come in large variety of types, and various criteria have been 
suggested to classify them. 

Like general dictionaries, specialized reference books can be subdivided on the basis 
of the amount of encyclopaedic information they give. A specialized dictionary in a nar- 
row sense focuses on presenting linguistic information (such as orthography, grammar, 
pronunciation, and meaning). A (specialist) encyclopaedia contains information on 
things and concepts rather than just words. The encyclopaedic dictionary has features of 
both of these. While this classification has the disadvantage of presupposing that infor- 
mation on language and information on things is easily distinguishable, is it very sim- 
ple and based on a long tradition. See Bergenholtz and Tarp (1995: 29f.), Bergenholtz 
(1996: 734-47), Zgusta (1971: 198f.), and Klosa (this volume). 

Wiegand (1988) established a classification using very similar terminology: language 
dictionary (G. fachliches Sprachwérterbuch), encyclopaedic dictionary (G. fachliches 
Sachwérterbuch), and all-round dictionary (G. fachliches Allbuch), a combination of the 
former two. His distinction is based on the notion of genuine purpose: from a language 
dictionary, for example, a potential user may derive information on a linguistic object. 
His approach, presented here in very condensed form, is widely discussed especially in 
the German literature, but it does not seem to be easily manageable and is based on a 
theory which is not easy to handle (Bergenholtz 1996: 746). Also, it remains unclear if it 
is relevant to user-oriented lexicography (Tarp 2008: 118). 

Specialized dictionaries can also be distinguished on the basis of the subject field(s) 
that the dictionary treats. If we consider a special field such as mathematics as the basic 
level in a hierarchy (see, e.g., Ungerer and Schmid 1996: 66-78), we can state that prob- 
ably the majority of specialized dictionaries aim at this level (single-field dictionary). 
There are, however, many works below and above this level. Thus, apart from single- 
field dictionaries, there are also sub-field and multi-field dictionaries (Bergenholtz and 
Tarp 1995: 58). A sub-field dictionary such as a dictionary of algebra can potentially 
cover a sub-field exhaustively, both in terms of the terminology and information given 
in the respective entries. 
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One potential problem with sub-field dictionaries is illustrated by the Dictionary of 
Classical and Theoretical Mathematics (2001; note that the title is misleading; the dic- 
tionary contains in fact terms from ‘geometry, logic, number theory, set theory, and 
topology’ and is thus a combination of sub-field dictionaries), The point is exempli- 
fied by the definition for Betti number (a topological concept), which includes the term 
direct sum and torsion group. These are not defined in the dictionary, as they are algebraic 
notions. Thus, a reader unfamiliar with these concepts would need to consult an addi- 
tional dictionary. It is common in mathematics to define one concept from one sub-field 
in terms of concepts from other sub-fields. 

Multi-field dictionaries may aim to treat a very broad field such as science or com- 
bine several fields. A common title for such a reference work is Dictionary of Science 
and Technology, there are also various business dictionaries that fall into this category. 
Such dictionaries have a smaller chance of exhaustively covering the terminology of the 
respective fields, and commonly focus on a small number of terms from each field. As 
Bergenholtz and Tarp point out, the coverage is often so small that only the most fre- 
quently used terms are covered at all, which leads them to conclude that ‘[i]n general, 
multi-field dictionaries are not to be recommended; except those that cover only a small 
number of interrelated fields (Bergenholtz and Tarp 1995: 59). 

The understanding of particular scientific and technical fields may change consider- 
ably over time, and this is reflected in the corresponding reference works. For example, 
early printed mathematical dictionaries are based on a much broader notion of math- 
ematics, and it was customary to cover a wide range of areas such as music, astronomy, 
and hydraulics. Nowadays, as sciences specialize and fragment, subfield dictionaries are 
becoming a common phenomenon on the dictionary market. 

Maximizing dictionaries attempt to cover the greatest part of vocabulary of a given 
subject field. Minimizing dictionaries, on the other hand, cover only a limited part of the 
field under consideration such as the most frequent terms. Multi-field dictionaries, for 
example, are usually minimizing (Bergenholtz and Nielsen 2003: 10). 

The name dictionary is by no means a reliable indication of whether a particular work 
of reference is a linguistic dictionary in the sense discussed above. There are, for exam- 
ple, so-called specialized dictionaries that are encyclopaedic in nature. A brief discus- 
sion of the topic is found in Opitz (1990: 1505f). 

When classifying specialized dictionaries, and more importantly in their production, 
the role of the potential users is as important as in general lexicography. One central 
factor to be considered is their knowledge of the subject field under consideration. At 
the end of the 1980s, when the user perspective was not yet a central concern in special- 
ized lexicography (SL), the editors of a very large lexicography handbook (Hausmann, 
Reichmann, Wiegand, and Zgusta 1989-91) devoted two chapters to this: one on diction- 
aries for experts (Opitz 1990), one for laymen (Kalverkamper 1990). Meanwhile, the dif- 
ferentiation has become more fine-grained. Bowker (2003: 156f.) distinguishes between 
true experts (people with a training or professional experience in a field), semi-experts 
such as students or experts from related fields, and non-experts such as technical writ- 
ers or translators. Martinez Motos (2011: 6) differentiates between experts in the field, 
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semi-experts from related fields, learners, and, finally, mediators such as translators or 
journalists. This approach gives a more prominent role to learners of the field. 

The user’s knowledge of the language is at least partly related to the knowledge of the 
subject field. For example, translators can be assumed to have an expert knowledge in 
the languages involved. 

Various further parameters can be applied to classify specialized dictionaries. These 
include the number of languages involved, the way the dictionary is structured (alpha- 
betic, thematic), and the medium (Schaeder 1994: 31). Typologies of specialized dic- 
tionaries are discussed in Felber and Schaeder (1998), and, with respect to bilingual 
dictionaries, in Hausmann (1991). 


24.2.3 Lexicography and Terminography 


SL is closely related to terminology, which studies scientific and technical terms, and 
terminography, which is concerned with the recording and presentation of such terms 
in reference works, including not just dictionaries, but also term banks. 

A number of differences between SL and terminography have been put forward. The 
following list contains the most frequently named ones (see, for example, Bergenholtz 
and Tarp 2010: 28): 


1. In terms of the working method, SL is supposed to start from the linguistic sign 
and then proceed to its meaning (semasiological approach), whereas terminog- 
raphy starts from the concepts and then approaches the term (onomasiological 
approach). 

2. SL is descriptive, terminography prescriptive. 

3. According to terminology, each term has exactly one meaning. There is no poly- 
semy. SL, though, allows for one term to have several meanings. 

4. SL prefers an alphabetic presentation, whereas terminography prefers a systematic 
one. 

5. SL may also describe terms diachronically, whereas terminography is purely 
synchronic. 


These differences have been controversially discussed in the literature. While more tra- 
ditional approaches emphasized a strict dividing line between the two disciplines, more 
recent works challenge this approach and go as far as equating the two (Bergenholtz and 
Tarp 2010 37). 

These oppositions should not be understood as binary, and there seems to grow- 
ing agreement that there is at least a significant overlap between the two fields. This is 
because most of the oppositions can be refuted. For instance, it is by no means the case 
that specialized dictionaries are always alphabetically arranged. 

For further discussion, see Humbley (1997), Schierholz (2003), LHomme (2006), 
Myking (2007), and Bergenholtz and Tarp (2010). 
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24.2.4 Scientific and Technical Terms 


A number of different kinds of criteria have been suggested to define the concept of a 
scientific or technical term. We can broadly distinguish linguistic, referential, and socio- 
logical or sociolinguistic approaches (Béjoint 1988; Martin 2007; Mortureux 2009). 

Linguistic criteria are frequently unreliable to distinguish a technical term from a 
general word. In fact, in most cases it seems more appropriate to speak of tendencies 
rather than criteria. For example, while most terms are nouns, there are many terms 
that do not belong this word class (Béjoint 1988: 356). Similarly, while many terms are 
based on Latin or Greek roots and therefore have an international character, this does 
not apply to a technical term like German Fuchsschwanz (‘cross-saw, from Fuchs ‘fox’ 
and Schwanz 'tail’) or scientific coinages such as quark and free lunch. Monosemy is also 
an unreliable trait: not only may a term such as field be part of the vocabulary of vari- 
ous disciplines such as physics and mathematics (admittedly, some authors prefer to call 
these cases of homonymy), but there are also cases of terms having different definitions 
in one and the same field or sub-field, such as ring in algebra? (Eisenreich 1999: 1225). 
Also, there are a number of metonymical patterns that increase polysemy. For example, 
side is a geometrical object, but the word is also used to denote the length of such an 
object. 

Hiillen (1984) and other authors use a referential criterion to define special language 
and technical terms. In general language, long bone points to an everyday situation and 
simply refers to a bone considered to be long, whereas in biology long bone is precisely 
defined and points to a specific scientifically constructed model. The term is part of a 
coherent system of terms and to understand it, no reference to a concrete situation is 
necessary. 

Sociologically or sociolinguistically, it has been said that scientific or technical terms 
belong to a specialized community of scientists or professionals (e.g. Martin 2007: 29). 

As a consequence of the varied character of these kinds of criteria and the different 
degree to which they can be applied and relied upon, Béjoint (1988: 359) establishes 
a hierarchy of criteria which a term can fulfil to a greater or lesser extent, with ‘they 
occupy an important place ina specialized taxonomy at the top. Béjoint’s approach does 
give the lexicographer some tools to decide whether to include a term or not, but he con- 
cedes himself (1988: 360) that ‘some of the defining criteria are difficult apply in practice’ 
and ‘it is not easy to list the criteria by order of importance. Between general words and 
technical terms there remains a ‘fuzziness’ (Béjoint 1988: 361), which is also influenced 
by the fact that there are a number of scientific or technical terms that move from a spe- 
cial language to a general language. In this process their meaning is frequently broad- 
ened and the concept becomes more vague. 


? A ring isa set of elements on which two operations (often called addition and multiplication) are 
defined and follow certain laws. Rings generalize the notion of integers. Some authors do not require 
aring to have a multiplicative identity, an element of the ring that leaves others unchanged when 
multiplied with them. In the ring of integers, this is the number 1. 
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Scientific and technical terms can usually be arranged more or less neatly in hierar- 
chies or in semantic networks. While this is not a distinctive trait of terms and applies to 
general words as well, it is essential to consult experts from the respective fields in this 
and various other stages in the production. The lexicographer will frequently depend 
on experts to gain an overview of the conceptual structure of a subject field, to select the 
terms to be included, and to write the definitions and label the entries. 


24.2.5 Term Selection 


In the process of term selection the use of corpora has limited applicability; with 
respect to general dictionaries, Béjoint (1988: 366) says ‘the tradition of selecting 
words from a corpus is ill-adapted to scientific and technical words’ because corpora 
are most reliable in connection with the central vocabulary of a language, and less reli- 
able in connection with non-central vocabulary such as technical terms (1988: 361). 
While in the meantime it has been convincingly argued how special purpose corpora 
can be designed and put into practice (Bowker and Pearson 2002: chs 3, 4), the use 
of expert advice is essential at this stage. For example, a non-expert may consult a 
book on topology and identify various terms, but may not recognize that some of 
the concepts used in the text are of an algebraic nature, in which case a label such as 
“Topology would be wrong. 


24.3 MONOLINGUAL SPECIALIZED 
DICTIONARIES, AND COVERAGE 
OF SCIENTIFIC AND TECHNICAL TERMS IN 
GENERAL MONOLINGUAL DICTIONARIES 


It is not just the case that the number of scientific and technical terms has increased 
enormously since the twentieth century, but also, as Landau (2001: 33f.) points out, that 
in general dictionaries they are also covered to a much higher extent than they used 
to be. One reason for this is the fact that new scientific and technical concepts are pre- 
sented to and discussed by a much larger public than previously possible. It is now com- 
mon, at least for the bigger dictionaries, to point to their coverage of scientific terms as a 
positive feature in advertising. 

The coverage of scientific and technical terms in general dictionaries also depends 
on historical and national factors, however. If we look back in history, the earliest edi- 
tions of the French Dictionnaire de [Académie francaise originally renounced such 
terms because it focused on language rather than things, and it excluded regional and 
archaic words as well as technical terms (Albrecht 1999: 1685f.). And even a cursory 
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look at some German dictionaries such as the Deutsches Universalworterbuch or 
Wahrig Deutsches Worterbuch shows that they do not explicitly state how many terms 
are included. Neither does the actual coverage seem to be as deep as in comparable 
English dictionaries. For example, the Deutsches Universalworterbuch does not include 
the algebraic meanings of Kérper and Gruppe, while most English dictionaries do 
include their counterparts field and group. 


24.3.1 Macrostructure of Specialized Monolingual 
Dictionaries 


Specialized dictionaries typically contain some of the following information, given here 
in a typical order. 


table of contents; 

preface; 

acknowledgements / list of contributors; 

list of subject fields, not infrequently as part of a list of abbreviations; 
usage guide; 

the dictionary; 

appendices. 


In many specialized dictionaries the usage guide is extremely short. Sometimes it is 
even part of the preface and consists of just a few lines. For example, the Concise Oxford 
Dictionary of Mathematics (4th ed., 2009; CODM) has no separate usage guide, but 
some brief comments on the level of intended readers, on the entries, and the style in the 
definitions. This practice can be justified if the types ofinformation given in the diction- 
ary is limited, for example in the case of encyclopaedic dictionaries which focus on the 
semantic content of the entry word. 

The usage guide and the preface should at least contain information on the intended 
readership of the dictionary, it should explain what fields and sub-fields are covered and 
to what extent, and it should give the reader a clear understanding of what information 
can be found in the entries and how they should be read. It must also be made clear 
how the book is arranged and where the terms can be found (for example in connec- 
tion with compound terms, which are particularly relevant for special fields). Landau’s 
International Dictionary of Medicine and Biology (1986; IDMB), for example, has a 
ten-page ‘Guide to the use of this dictionary’ which discusses at length the form of the 
entries, vocabulary, alphabetization, cross-references, usage information, etymologies, 
and abbreviations and symbols. 

The list of contributors is indicative of the comprehensiveness and accuracy which it 
is intended to achieve. It is common to consult specialists, especially when establishing 
the wordlist and writing the definitions (see Section 24.2.4). 
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Ina general dictionary, the number of consultants for one single field is usually small, 
perhaps consisting of one person only. The CODM, aimed at pupils, college students, 
and first-year university students lists two editors and five contributors, while in the case 
ofa fine-grained specialized dictionary aimed at experts the number of consultants may 
reach several dozens as in the JDMB. 

Some dictionaries offer various forms of additional information, which are suited to 
the subject field in question or are of a more general nature. As an example, the CODM 
offers fourteen appendices, including tables of areas and volumes, of integrals, math- 
ematical symbols, and Greek letters. Kucera’s Dictionary of Exact Science and Technology 
(2002) includes an article by Gunter Neubert on term creation in English and German. 
The reasons are indicated by Neubert (1990: 7of.): 


Users do not seem to be very familiar with the principles of term creation, and lexi- 
cographers should make it easier for the user to create and translate new terms. 


In terms of the coverage, specialized dictionaries tend to contain types of words which 
will in most cases only be found in the biggest general dictionaries such as the Oxford 
English Dictionary (OED). In particular, such dictionaries may contain biographical 
entries, terms containing names (eponyms) suchas Abelian, names of theorems (CODM 
has an entry for Fundamental Theorem of Arithmetic, for example), and symbols (e.g. 
the Encyclopedic Dictionary of Polymers (2nd ed., 2011) has the entry o for the stand- 
ard deviation, while CODM prefers transliterating the symbols and thus has an entry 
aleph-null rather than X,). 


24.3.2 Microstructure of Specialized Monolingual Dictionaries 


A specialized microstructure typically contains some of the following elements: 


orthography, syllabification, hyphenation; 

¢ pronunciation; 

¢ grammatical information such as word class or gender; 
« subject field markers and various kinds of labels (e.g. regional ones); 
¢ definition; 

¢ encyclopaedic information; 

e example sentences; 

* pragmatic information; 

illustrations; 

bibliographical references and internet links; 
historical information. 


If the dictionary is encyclopaedic rather than linguistic, the microstructure tends to be 
much simpler. 


SCIENTIFIC AND TECHNICAL DICTIONARIES 401 


24.3.3 Defining and Labelling of Scientific and Technical 
Terms in Specialized and General Dictionaries 


Defining lexical items is at the heart of lexicography, whether general or specialized. In 
connection with scientific and technical terms, a distinction has been made by Sidney 
Landau (1974), following Robinson. According to him, the meanings of general words 
are extracted from a body of evidence, and the words are defined on the basis of actual 
usage as illustrated by the corpus. The meanings of technical terms in specialized dic- 
tionaries, are, as he says, imposed on the basis of expert advice. In this view, the defini- 
tion of a technical term shows how a word should be used (prescription) rather than 
how itis used (description). 

Definitions of both scientific and general terms commonly begin by indicating the 
general conceptual category to which a concept belongs (the genus), followed by delimi- 
tating characteristics (the differentiae) which differentiate the concept from other 
members of the genus. This is the classical approach to definitions, as illustrated by the 
definition of triangle in the OED: ‘A figure (usually, a plane rectilineal figure) having 
three angles and three sides. 

The precise wording is often difficult, and there are differences between a general and 
a specialized reader to which the lexicographers must pay due attention. 

In general, scientific terms must be defined precisely. The general lexicographer can- 
not strictly adhere to scientific standards, though, because this would in many cases 
make the definition incomprehensible for the general reader. Often, a scientifically 
acceptable definition will violate the closure of the dictionary in that the actual word- 
ing of the definition may contain terms not found elsewhere in the dictionary. We have 
already discussed an example of this above (see Betti number at Section 24.2.2 above). 
Béjoint (1988: 364) points out some national characteristics: American dictionaries 
seem to strive to be more scientific than British ones. 

There are both similarities and differences in the way terms are defined in a general 
dictionary and in a specialized dictionary. As one extreme case, some specialized dic- 
tionaries do not define some elementary terms at all, a procedure unacceptable in a gen- 
eral dictionary. For example, the CODM does not define triangle at all (but it does give 
extensive encyclopaedic information on the concept). 

The genus is sometimes identical. For example, the algebraic term group is defined in 
the Oxford Dictionary of English (rev. ed., 2005) as ‘a set of elements, together with an 
associative binary operation, which contains an inverse for each element and an identity 
element. The genus set is also used by specialized dictionaries such as James’ Mathematics 
Dictionary (1992); set has the advantage that it is a word both used in general language 
and also a term, and the specialized meaning does not differ widely from its general one. 

However, when the terms are more abstract, the genus tends to be broader in gen- 
eral dictionaries. For example, the OED defines a probability space as ‘a space each point 
of which is an outcome and has a probability associated with it. In contrast, CODM 
defines the same term as ‘A finite *measure space with associated probability measure 
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that assigns unit measure to the complete space? The comparison shows that space, 
chosen in the general dictionary, is a very broad genus. It is not a mathematical term, 
whereas the genus measure space is much less general than space and it isa mathematical 
term (the asterisk functions as a cross-reference). In short, in specialized dictionaries 
it is the genus proximum, the next-highest genus that is more likely to be chosen in the 
definition. 

Because they may assume more specialized knowledge on the part of their readers, 
specialized dictionaries may use symbols and formulae more widely, depending on the 
assumed knowledge of the users. A denominator is usually defined as the number below 
the line in a fraction; the Dictionary of Classical and Theoretical Mathematics (2001) sim- 
ply says it is ‘The number b in the fraction a/b. Symbolic language is occasionally used in 
general dictionaries, complex formulae are usually not. In specialized dictionaries, how- 
ever, they allow for greater precision. For example, derivative is defined in the CODM: 
‘For the *real function f, if ((ath)-f(a))/h has a *limit as ho, this limit is the derivative 
of f at a and is denoted by f (a)? This wording does not meet the standards of a math- 
ematically valid definition completely, but it is a clear and correct definition that the 
users of the dictionary will find both understandable and applicable. A style familiar to 
mathematicians is also found in Dictionary of Classical and Theoretical Mathematics, s.v. 
dyadic compactum: ‘Let X be the discrete space with two points...’ 

Scientific classifications may differ significantly from general ones. Béjoint 
(1988: 365) notes that for scientists, the sun is a star. A definition aiming at a higher 
degree of scientificity may choose star as the genus, but this may be confusing for a gen- 
eral reader because in a folk taxonomy the sun is not classified as a star. Compare Hanks 
(this volume). 

While it may be correct to say that abstract words ‘are not easily defined by the ‘genus 
+ differentiae’ style’ (Jackson 2002: 94), note that resorting to synonyms, a frequently 
used technique in general dictionaries, creates the danger of circularity (Jackson 
2002: 94) and often it is simply not applicable (Landau 2001: 164-6). However, the more 
specialized the dictionary is, both these points apply to a lesser extent as more liberties 
can be taken without impeding comprehensibility and correctness. See Kwary (2011) for 
a typology of definitions in specialized dictionaries. 

In general dictionaries, the subject label or a sense discrimination must always be 
given. For example, in the online version of the American Heritage Dictionary the entry 
for alternating group reads: ‘A group consisting of all possible even permutations of a 
given number of items’ Without a label, the reader does not even know that the term is 
mathematical, and would struggle to find the right meaning under group. 

Elementary terms that also occur frequently in general language must be said to be 
labelled somewhat irregularly. For instance, circle or line are not labelled as mathemati- 
cal terms at all in various general dictionaries, whereas pyramid frequently is. 

Stylistic labels such as ‘informal do not apply to technical terms to the extent they do 
to general words. However, they are occasionally used: Elseviers Economics Dictionary 
(2001), for example, uses Collog (colloquial) and Derog (derogative). Thus, the entry for 
money has the synonyms roll and folding money thus marked, which shows that if such 
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labels are used, they can only be applied if the dictionary also includes general language 
words not belonging to the strictly technical vocabulary. 

Note that there are a number of topics and terms in fields such as ecology and politics 
which are controversially discussed, not just by specialists but also by the general public. 
Also, in some disciplines different schools of thought may define the same term in dif- 
ferent ways. The terms may thus be affected in two ways: euphemistic terms, if included 
in a dictionary, should clearly be marked as such and possibly commented upon, and 
definitions pertaining to specific schools should similarly be marked and explained. 

Various specialized dictionaries allow for more prescriptive elements than are com- 
mon in general lexicography. In some dictionaries, this policy is explicitly stated (e.g. 
A Dictionary of Epidemiology 2008: ix). Norman (2002: 270) argues that ‘prescription 
should be explicit and transparent. This practice is often not followed, but there are 
exceptions. The Geologisches Wérterbuch (12th ed, 2010) comments, under tektonis- 
che Gesteinsfazies (petrographical facies), that the word Tektofazies is less to be recom- 
mended because it may be confused with the Greek word feko. 


24.4 BILINGUAL AND MULTILINGUAL 
DICTIONARIES 


24.4.1 Bilingual Specialized Dictionaries, and Coverage 
of Scientific and Technical Terms in General Bilingual 
Dictionaries 


Bilingual dictionaries can be unidirectional or bidirectional. In the first case, they con- 
tain only one wordlist such that the translation equivalents can only be accessed from 
one language. According to Bergenholtz and Tarp (1995: 52), most specialized diction- 
aries are of this type. In the second case, they contain two words lists, one for each lan- 
guage involved. As Yong and Peng (2007: 76) point out, while such dictionaries ‘can 
serve both “comprehension” and “production’, they are usually bulky, inconvenient to 
use and expensive to buy: 


24.4.1.1 Equivalents and Anisomorphism 


The principal aim of a bilingual dictionary is to give equivalents, words, or expressions 
corresponding semantically to a word or expression in another language. Equivalence 
can only rarely be said to be full, but in connection with scientific or technical terminol- 
ogy there is a higher possibility. 

Anisomorphism between languages interferes with equivalence (see Adamska, this 
volume). With respect to special languages, it is possible that a terminological system 
simply does not exist in one of the languages involved. Werner (1999: 1868) illustrates 
this by pointing out that there is no equivalent terminology in Greenlandic for Spanish 
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terms for coffee cultivation. It is also possible that the same subject fields exist in both 
languages, but the terminologies are not identically structured. This is the case of cul- 
ture-dependent fields such as law. 

Yong and Peng (2007: 139) speak of specialized anisomorphism in connection with rare 
technical terms and newly created terms. Such terms do not necessarily lend themselves 
to equivalence because of the possible absence of an entity or a term in the target lan- 
guage. They point out that especially in computer science some English terms ‘have not 
found their counterparts or proper equivalents in Chinese’ (Yong and Peng 2007: 139). 


24.4.1.2 Information in the Dictionary 


According to Atkins (2002b: 7), ‘the ideal dictionary should be tailored, or at least tailor- 
able, to one particular type of user’ Bilingual dictionaries are produced with the needs of 
specific users in mind, and their major functions involve text production and reception 
in the foreign language, and translating from the native language into the foreign Jan- 
guage and vice versa (see Bergenholtz and Tarp 1995: 20-8). The kinds of information 
are determined by which of these functions a dictionary is supposed to fulfil. 

For practical and economic reasons, specialized bilingual dictionaries are often pro- 
duced with various user groups in mind. Some authors argue that such polyfunctional- 
ity should generally be aimed at (Rossenbeck 2005: 189). 

With respect to general dictionaries, it has been claimed (Yong and Peng 2007: 24) that 
bilingual dictionaries tend to include fewer technical terms than monolingual ones. 
While exact figures are not easy to determine, the claim is supported by the fact that the 
bilingual dictionaries consulted do not advertise their coverage of such terms to nearly 
the same degree as many monolingual ones. 

Bilingual dictionaries tend to focus on usage rather than meaning (Bowker and 
Pearson 2002: 139); they generally provide fewer definitions than monolingual dic- 
tionaries. Sometimes, this is done at the cost of making a dictionary entry almost use- 
less. For example, Tarp (2003: 111) cites a dictionary which gives five entries for the 
Danish term kapitalinteresse, with each entry giving exactly one English equivalent. If 
a dictionary does not comment on the differences in any way, the user does not know 
which of the equivalents to choose. As a mathematical example, the Mathematisches 
Fachwérterbuch (3rd ed.3, 2007), under K6rper, gives the four equivalents field, body, 
solid, and domain, without any labels or semantic distinctions. Again, this is not very 
informative as the first term is algebraic, the second is rather uncommon in mathemat- 
ics and refers to a physical body, the third is geometric, and the fourth, an extremely 
polysemous term, may be ring-theoretical (algebraic), topological, or refer to the 
domain of a function. 

Labels also help the user find the right equivalent (but even in the same sub-field one 
term may have several meanings). A label like ‘Math, however, may also be much too 
general even for a general dictionary: In the Oxford-Hachette French Dictionary (4th 
ed., 2007) under field we find ‘Math champ’ While it might be argued that this is a prob- 
lem of coverage (the dictionary may not cover algebra), the translation champ refers to 
another meaning of field which is not its principal one. 
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As pointed out above, the kinds and amount of information given in the dictionary 
entries strongly depends on the intended function of the dictionary. 

For example, if the dictionary is intended to assist the user in the translation 
from the native language into the foreign language, or to assist in the production of 
foreign-language texts, there should be information on how the equivalents are used in 
the foreign language. This includes a wide range of linguistic information such as gram- 
mar, morphology, examples, and collocations. If several equivalents are indicated, it 
must be made clear in what ways they differ. Werner (1999) systematically discusses the 
information required in specialized dictionaries with respect to lexicographic functions. 

Various authors have argued that specialized dictionaries should also explain terms 
in their specialized context (Bergenholtz 1994: 54). Bergenholtz and Tarp (1995: 117-26) 
recommend the inclusion of a large amount of phraseological information, especially 
for the purpose of text production. Many German users, for example, are unsure as to 
whether on or in is the preposition to use with Internet. But there is also encyclopaedic 
information that supports the user in this purpose (Rossenbeck 1994). Note that these 
points are also debated in connection with monolingual dictionaries (Pearson 1998: 71; 
Collet 2004). 

Norman (2002: 270) argues for the inclusion of ‘relatively general lexis, methods- 
related lexis, and non-noun lexis: Some specialized dictionaries are already practising 
this to some extent. For example, the Mathematisches Fachwérterbuch (3rd ed., 2007) has 
entries for conjecture or argument. While this is useful for both production and recep- 
tion, including such terms raises issues that need to solved. For example, argument is 
also a mathematical term, and the two translations German Beweisgrund (the general 
word) and German Argument (which translates both the general word and the math- 
ematical term) must be clearly distinguishable. Also (moving away from this particu- 
lar dictionary), the word Vermutung (‘conjecture’) shows that as part of a compound, 
the translation may be rather unpredictable: While Poincarés Vermutung corresponds 
to Poincarés Conjecture, Riemannsche Vermutung corresponds to Riemann hypothesis, 
and Fermatsche Vermutung to Fermat's Last Theorem. Finally, for the purpose of produc- 
ing text in a foreign language it would certainly be useful if specialized dictionaries paid 
more attention to word classes other than nouns. For non-native speakers, it is by no 
means obvious how to use verbs such as cut or intersect (which are common in math- 
ematics) correctly, and in what ways they differ in mathematical contexts. 


24.4.2 Multilingual Dictionaries 


With multilingual (also called plurilingual) dictionaries, the issues discussed in con- 
nection with bilingual dictionaries increase. Such dictionaries can ideally replace 
a large number of bilingual dictionaries, depending on how many languages are cov- 
ered, but they may also reach huge proportions and have a complicated microstruc- 
ture (Bergenholtz and Tarp 1995: 54). As Pearson (1998: 71f) states, they ‘are even less 
informative than bilingual specialized dictionaries, meaning that they tend to include 
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source and target terms only. Occasionally, editors acknowledge this explicitly, such 
as Eisenreich, who in the preface to the Worterbuch Mathematik (1982) says that while 
definitions are indispensable, for reasons of space they are impossible to give. Zgusta 
(1971: 214) already pointed out that perhaps the only domain in which their existence 
is justified is technical terminology, because it is here that equivalence may most real- 
istically be achieved. Considering that polysemy is a common phenomenon, however, 
Zgusta’s statement may have been too optimistic. While the production of general mul- 
tilingual dictionaries has declined in the twentieth century, specialized multilingual 
dictionaries continue to be produced even in culture-dependent fields in which the 
problem of equivalence is even bigger than in other areas. 

An example of a mathematical dictionary with four languages is Elsevier's Dictionary 
of Mathematics (2000), As is typical of multilingual dictionaries, the entries in the 
principal word list are numbered so that under “4306 field; commutative field’ we find 
German ‘[kommutativer] Kérper m} French ‘corps m [commutatif], and Russian ‘none 
n; KOMMyTaTMBHOe Teno 1. Secondary wordlists, in this case, in French, German, and 
Russian, then allow non-English speakers to find the right equivalents. Because of the 
complexity of such dictionaries, this cross-reference system is error prone. For exam- 
ple, under German K6rper there is a reference to the article commutative division 
ring, but under French corps there is none. The dictionary also indicates verbs and uses 
labels for British and American English. Few multilingual dictionaries offer a richer 
microstructure. 

See Rossenbeck (1991) for a discussion of problems relating to multilingual special- 
ized lexicography. 


24.5 CONCLUSION AND FUTURE PROSPECTS 


Specialized lexicography has long been neglected by lexicographers (Rossenbeck 
2005: 179f.), to the point that Fuertes-Olivera (2012: 96) claims that ‘interest in special- 
ized lexicography was almost non-existent’ until the publication of Bergenholtz and 
Tarp (1995). In the meantime, this situation has improved, and pedagogical special- 
ized lexicography has received particular interest (Fuertes-Olivera and Arribas-Baho 
2008; Fuertes-Olivera 2010). While it is overall true to state that ‘specialized dictionar- 
ies and dictionary compiling have changed, [but] they need to be developed in a num- 
ber of ways’ (UHomme 2006; 182), pedagogical specialized lexicography is perhaps 
the field which displays the most interesting practical developments. For example, 
the Fachwortschatz Medizin Englisch (2nd ed., 2007), a thematically arranged bilin- 
gual (German-English) medical dictionary offers a wealth of information, including 
extensive definitions, authentic examples, collocations, phrases, all of which are use- 
ful to aid the user in the production of medical English. The book is additionally con- 
ceived as a ‘Sprachtrainer’ (‘language trainer’) as well, since each entry cross-refers to 
a large number of related terms. For example, under ointment the reader is not only 
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referred to English terms such as paste, cream, and balm, but also to German terms such 
as Verbandsmull (‘absorbent gauze’) or Feuchtigkeitscreme (‘moisturizer’). Temmerman 
(2003) discusses two French specialized dictionaries in which some similarly innovative 
methods are used. 

The full potential of such reference works is unlocked by their electronic versions, 
since looking up the enormous number of cross-references is very time-consuming 
in the paper versions. While most electronic dictionaries today are usually electronic 
versions of their printed counterparts (Sanchez 2009: 188), dictionaries such as the 
ones just mentioned offer exciting future perspectives. They could even involve their 
users by letting them decide which information they wish to display, thus allowing for 
multi-purpose dictionaries without overloading the content. 

Finally, a note on historical specialized lexicography. Both in terms of 
dictionary-making and dictionary research this area has been much neglected. In par- 
ticular, dictionaries covering the origins and histories of scientific terms are scarce, a 
work such as the Historisches Worterbuch der Biologie (Historical Dictionary of Biology, 
2011), which gives detailed information on the history of biological terms and concepts, 
being one of the few exceptions. 
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25.1 INTRODUCTION 


THE productive, potentially infinite, combinability of discrete, minimal units into 
larger ones is a defining hallmark of human language. Words are the building blocks 
for phrases and sentences, and the encoding and decoding of messages requires access 
to that component of human grammar where words—or, more formally, form-—mean- 
ing pairs—reside. The lexicon is large by any count and its boundaries are fuzzy. Both 
linguistic theorists and lexicographers tend to agree that efficient storage and look-up of 
words and their meanings require economy; multi-word sequences that obey the prin- 
ciples of well-formedness and semantic compositionality probably have no lexical status 
and merit inclusion neither in speakers’ mental lexicons nor in lexical resources com- 
piled by lexicographers. For example, phrases like weather forecaster or extreme winters, 
though they may be frequent, can be composed and understood on the fly, while the 
meanings of phrases like fire sale and lend a hand are not straightforwardly (de)com- 
posed. For this reason, such semantically idiosyncratic phrases constitute lexical units 
despite their multi-word make-up. However, we shall see that the boundary between 
multi-word units (MWUs) with lexical status and freely composed phrases is often not 
clear-cut. 

Analyses of spoken and written corpora reveal a high percentage of MWUs, includ- 
ing collocations and idioms, both in terms of types and tokens (Jackendoff 1997; 
Moon 1998; Cowie 1992). MWUs are a compelling subject of study not only because 
of their pervasiveness and universality, but also because they challenge any defini- 
tion of ‘word’ and resist clear-cut criteria for integration into both speakers’ mental 
lexicons and lexical resources. On the one hand, speakers recognize MWUs as lexical 
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units paired with distinct meanings, a fact reflected in traditional lexicography and 
the ‘classical’ view of MWUs as merely long, fixed ‘prefabricated’ lexemes. On the 
other hand, corpus data reveal that many MWUs show considerable variation, and 
they present strong evidence that many phrasal units are subject to regular morpho- 
syntactic processes that may operate on the phrase as a whole or on internal con- 
stituents, independent of their semantic transparency. Semantic wholeness in the face 
of rich grammatical properties makes the treatment of MWUs in lexical resources a 
challenge. Before addressing the lexicographical treatment of MWUs, we distinguish 
and define several key concepts. 


25.2 CO-OCCURRENCE, SELECTIONAL 
PREFERENCE, AND COLLOCATION 


Words are selective about their context. For example, an English speaker brushes his 
teeth, unlike his French counterpart, who washes them. And English speakers go off on 
a tangent more often than they take off on a tangent. We talk about a confirmed bach- 
elor rather than an affirmed bachelor, and disasters are unmitigated but rarely, if ever, 
unrelieved or undiminished, although the less usual phrases would be understood by a 
cooperative listener. 

The idiosyncracies of lexical selection are reflected in the regular and statistically dis- 
cernible phenomenon of collocation (Firth 1957b; Sinclair 1991; Stubbs 2001; Partington 
2004; McEnery and Wilson 1996, inter alia). Church and Hanks (1990) demonstrated 
the measurability of collocational properties with Mutual Information—a statistical 
measure based on corpus-based co-occurrence frequencies—that quantifies the idi- 
osyncratic attraction of wordforms to one another beyond syntactic and semantic con- 
straints imposed by subcategorization and selectional restriction rules. Thus, while we 
understand the meaning of the phrase powerful tea, corpus frequency data show that the 
noun tea overwhelmingly prefers to select strong as the adjective to express the appro- 
priate meaning. 

Statistical analyses of the phenomena of collocation enable the discovery of colloca- 
tions (e.g. Bond et al. 2003; Baldwin et al. 2003; Fazly and Stevenson 2006; Fazly et al. 
2009; Evert 2004/2005; Schone and Jurafsky 2001, inter alia) and to quantify the strength 
of their co-occurrence as well as the degree of their lexical and syntactic fixedness, a 
measure of their lexical status. Such analyses show that there are no hard rules to distin- 
guish between merely preferred co-occurrences and more or less fixed collocations that 
arguably have lexical status. Rather, co-occurrence preferences are situated on a con- 
tinuous scale of fixedness, and consequently, there is no clear cut-off point that would 
decide when a phrase qualifies as a unit and for inclusion in the lexicon. Selectional 
preferences remain a challenge for learners and for machine translation systems, which 
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may successfully identify the appropriate meaning but fail to pick out the most felicitous 
wordform. 


25.3 COLLOCATIONS, SUPPORT VERB 
CONSTRUCTIONS, AND IDIOMS 


MWUs are characterized not only in terms of their collocational strength but also in 
terms of their formal, morphosyntactic properties and their semantic compositionality, 
that is, the extent to which the meaning of the phrase derives from the meanings of its 
constituents. We present a brief typology for MWUs in contemporary English, focus- 
ing on several sub-classes of collocations and idioms. All are statistically significant 
co-occurrences of specific lexical items and fall along a sliding scale of syntactic fixed- 
ness and semantic non-compositionality. 


25.3.1 Collocations 


Collocations—multi-word combinations that show strong collocational attraction— 
include noun compounds (laptop computer, book sale), light and support verb construc- 
tions (have a drink, take a picture, make a fuss), syntactically marked phrases such as 
the propositional phrases consisting of a verb and a bare noun (in school, to prison), and 
verb phrases like answer the door. 


25.3.2, Noun Compounds 


Many noun compounds are semantically opaque (speed trap, ski bum) and are clear can- 
didates for inclusion in a dictionary, where they are treated as units or ‘long words’ But 
such compounds, although often semantically idiosyncratic, are highly productive. The 
semantic relation between their members can vary, but speakers appear to have no dif- 
ficulty in understanding them. Thus, a fire sale is a sale that takes place because ofa fire; 
during a bake sale, baked goods are sold (usually for fund-raising purposes); and garage 
sales and yard sales occur in garages and yards. In each case, the semantic composition is 
quite different, though the meanings of the members of the compound are transparent. 
New compounds are often formed on the basis ofa pattern. 

Many compounds are short-lived and are closely bound to a specific context; banana 
war is readily understandable when used in discussions of nations competing to 
export their fruit and helicopter mother is interpretable only in the context of the cur- 
rent public debate on child rearing. There is no clear-cut answer as to whether or not 
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such compounds should be included in lexical resources and coverage differs across 
dictionaries. 


25.3.3 Prepositional Phrases 


Some collocations are candidates for inclusion in lexical resources due to their syntactic 
idiosyncracy. These include the class of Prepositional Phrases exemplified by in class/ 
out of school/to college/in prison/jail/hospital. The singular noun is always bare, and no 
adjectival modification is possible, that is, no lexical material can intervene between the 
preposition and the noun: *in boring class/* out of high-security prison/*in local hospital. 
This collocational pattern is productive to a limited degree: 


in school/in graduate school/in medical school/in kindergarten/in college 
in chemistry/Spanish class 

in jail/*in penitentiary/*slammer/*workhouse 

in hospital/*infirmary/*clinic 

in court/*in Supreme Court 

in bed/*on sofa 


Because the patterns are not fully productive, dictionary entries for nouns like court and 
jail need to indicate their use in such phrases.! 


25.3.4 Support and Light Verb Constructions 


Support Verb Constructions (SCVs) or Light Verb Constructions (Grimshaw and 
Mester 1988; Kearns 1998/2002, inter alia) are syntactically well-formed verb phrases 
that are semantically compositional but exhibit strong lexical preferences.’ 

SVCs consist of a ‘light’ or support verb and a complement that can be an NP, a PP, a 
double object, or an NP-PP: 


give an explanation, have a drink 
set on fire, put on hold 
give the floor a sweep, keep someone company 


take account of something, make a call to somebody 


! British and US English differ slightly in their use of these phrases; thus in hospital is not common 
among American English speakers. 

2 The distinction between SVCs and LVCs is not clear-cut and we will ignore it here, referring to all 
MWuUs under consideration as SVCs. 
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The choice of verb in such phrases does not admit any lexical substitution, even of arguably 
close synonyms, without a change or loss of meaning. For example, the verb in the phrase 
take a bow, meaning to accept applause or acclaim, does not admit of the substitute make; 
the meaning of set in motion is lost in the phrase place in motion, and while give someone 
company seems intuitively plausible, the common phrase is keep someone company? 

Unlike collocations such as brush one’s teeth, which seem to be unsystematically dis- 
tributed across the lexicon, SVCs constitute a syntactically, lexically, and semantically 
well-defined class found in many languages (see, for example, Grimshaw and Mester’s 
(1988) discussion of Japanese suru constructions, and similar constructions in Persian). 
Other, related, examples are give chase/voice to, and constructions of the form have N, 
where the noun is a bare verb stem, for example have a drink/read/smoke/look. No satis- 
factory account has been offered so far for the productivity of these phrases and its limi- 
tations (but see, for example, Wierzbicka’s (1982) discussion of the constructions have a 
drink vs. *have an eat), 

Most of the prepositional phrases and SVCs discussed here may be lexically or syn- 
tactically idiosyncratic, but they are usually semantically decomposable (‘encoding’ in 
Fillmore et al’s (1988) terminology) and can be readily interpreted by speakers un fa- 
miliar with the phrasal units in the appropriate context. However, the choice of the par- 
ticular verb in these constructions to the exclusion of other verbs is unpredictable and 
requires that SVCs be represented in lexical resources as part of the entry for the noun. 


25.3.5 Idioms 


We next consider different classes of Verb Phrase idioms; each poses different challenges 
for adequate lexicograpahical representation.‘ 


25.3.5.1 Idioms with Irregular Phrase Structure 


Some idioms are syntactically idiosyncratic and cannot be assigned to a phrasal cate- 
gory. Their constituents are semantically transparent lexemes, but they are ‘unfamiliarly 
arranged, in the words of Fillmore et al. (1988). While their phrasal irregularity makes 
these MWUs clear candidates for inclusion in lexical resources, they are also fixed and 
constitute syntactic units, making their treatment straightforward. Examples are 


nothing doing/nothing much doing (cf. *nothing was doing/*nothing was done) 
and then some (cf. *and then more) 


> Some state-denoting complements co-occur with different verbs, reflecting regular aspectual 
alternations between causative, inchoative, and stative: put/be/keep on hold, put/be on fire; however, not 
all SCVs show all alternates: *go/* get/*keep on fire. 

* Fixed expressions of the following kind will not be considered here: proverbs like the early bird gets 
the worm, phrases like mum is the word, the shoe is on the other foot, and when the cows come home/when 
hell freezes over, and routine formulae that often have a pragmatic point (Fillmore et al, 1988) such as have 
a good one, take care, let’s not go there, and similes such as sharp as a whistle. 


416 CHRISTIANE FELLBAUM 


say when (cf. *say what/*when was said) 
all of a sudden (cf. *some of a sudden) 
by and large (cf. *by and very large) 


Fillmore et al. (1988) further distinguish constructions where ‘unfamiliar pieces are 
unfamiliarly arranged. Such ‘decoding’ idioms are non-compositional and unana- 
lysable. The ‘unfamiliar pieces’ in some MWUs could be considered to be so-called 
‘cranberry-morphemes, whose distribution is strictly limited to the idiom, as in kith and 
kin, spic and span. Speakers do not usually assign a meaning to the nouns outside of the 
idioms (though kin of course has an independent meaning). Because these expressions 
cannot be assigned to a lexical or syntactic category, the idioms are syntactically irregu- 
lar and, consequently, frozen: 


kith and kin (cf. *kin and kith, *kith and relative) 
spic and span (cf. *spic and very span) 


Adequate lexicographical treatment would reflect the structural unity and fixedness of 
these phrases, treating them as a single lexical item. 


25.3.5.2 Partially Lexically Filled Idioms 


Many idioms are verb phrases with noteworthy syntactic properties (see Lebeaux 
1988, 2000 for a comprehensive classification). A large number of verb phrase 
idioms are discontinuous, where one internal argument is lexically unspecified 
(Fillmore et al. 1988 refer to these as ‘formal idioms’). These include the many 
English idioms with a possessive bound to the subject (e.g., have one’s cake and eat 
it, be on one’s last legs, blow one’s stack) or a pronoun bound to a lexically free noun: 


cook s.0.5 goose 
give s.o. the slip/a hand 
take advantage of s.o. 


put s.o. ona pedestal 


These idiomatic constructions are syntactically regular and constitute semantic units, 
yet they are not fully spelled out lexical units. Their entries need to clearly indicate not 
only the expression’s structure and meaning but also the place of the open position. 
Thus, a noun phrase with possessive morphology is part of the idiom cook someone's 
goose; this possessive cannot appear in an appositive structure without a loss of the idi- 
omatic reading: 
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cook Peter's goose 


*cook the goose of Peter 


However, a Beneficiary argument can usually be syntactically realized either as an indi- 
rect object or in an adjunct: 


give the slower students a hand 
give a hand to the slower students 
give the police the slip 

give the slip to the police 


5.3.5.3 Schematic Idioms 


Fillmore et al. (1988) and Kay and Fillmore (1999) provide a fine-grained analysis of 
so-called ‘schematic idioms. These are specific syntactic configurations characterized 
by the presence of a few lexical items (usually function words) and specific meanings. 
Examples are the X-er the Y-er and not X let alone Y. The range of lexemes they admit 
in the unfilled slots is highly constrained by the meaning of the schema. Thus, the two 
lexemes framing let alone must be in some kind of scalar relation, with Y expressing a 
greater value than X (the water in the hotel was not warm, let alone hot; I can’t pay the rent 
let alone make a downpayment on the apartment). 

Schematic idioms are syntactically, lexically, and semantically irregular; their proper- 
ties must be associated directly with the construction and may be extremely complex, 
as in the case of the ‘Mad Magazine Construction (Him a doctor?). These constructions 
are fixed and show no syntactic variation, although they allow a range of lexemes whose 
meanings are constrained by the meaning of the construction. As Fillmore et al. (1988) 
show in almost forty pages of discussion, a comprehensive account of the lexical pro- 
ductivity and its limitation requires a description that exceeds the format of conven- 
tional dictionary entries. 


25.3.5.4 Verb Phrase Idioms with Modified Phrase Structure 


A large number of idioms are Negative Polarity items (NPIs); in the absence of a nega- 
tion expression, these phrases lose their idiomatic meanings (e.g. Lichte and Sailer 
2004). Examples are: 


not give a fig/damn/hoot 
not have a leg to stand on 
horses wouldn't get NP to VP 
not be quite right in the head 
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The challenge for adequate lexicographical treatment of these MWUs is to convey the 
obligatoriness of the negation while allowing for variation in the specific choice of 
negation: 


nobody gave a fig about the victims 
he never had a leg to standon 


I don't know whether he is quite right in the head 


A number of NPI idioms are headed by modal verbs, and the absence of an idiom-specific 
modal changes the meaning of the phrase: 


wont hear of it (cf. I don’t hear of it) 
can't take it with you (cf. she didn't take it with her) 


The lexicographer must convey this constraint in the entry of the phrase; the considerable 
variability of the negation is particularly challenging. 


25.4 SEMANTIC COMPOSITIONALITY 


Like collocations, idioms vary in the extent to which they are lexically and syntactically 
fixed. But unlike collocations, which are compositional, idioms are semantically opaque to 
different degrees. The meaning of syntactically well-formed idioms like kick the bucket, bite 
the dust, rock the boat, and hit the ceiling does not arise from the meanings of their con- 
stituents and cannot be guessed by speakers unfamiliar with the expression. Consequently, 
such semantically non-compositional expressions must be represented as units in lexical 
resources. However, their representation as strings that suggests fixedness does not do 
justice to the considerable variation with which speakers use these MWUs, as shown in 
Section 25.4. 

Many VP idioms, like spill the beans and let the cat out of the bag are considered to be at 
least partially analysable, in that speakers assign a reading to one or more of their constit- 
uents. Nunberg et al. (1994) call ‘idiomatically combining expressions or ‘internally regu- 
lar’ those idioms whose parts can be given a literal interpretation; in Fillmore et al’s (1988) 
terminology, they are ‘encoding’ For example, in spill the beans, spill can be interpreted 
as ‘reveal’ and the beans as ‘information’ or ‘secret’ (see also Fellbaum 1993); similarly, cat 
in let the cat out of the bag refers to sensitive or secret information, while bag refers to an 
(abstract) enclosure or hiding place. Each component of such idioms can be semantically 
interpreted, and the combination of these meaning constitutes a string whose form and 
meaning correspond in a one-to-one fashion to that of the idiom. By contrast, Nunberg 
et al. (1994) note that ‘internally irregular’ idioms like kick the bucket and buy the farm 
are not compositional, as the constituents of a transitive verb construction cannot be 
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semantically interpreted and re-assembled to map onto the literal reading of the intransi- 
tive verb, ‘die 


25.4.1 Metaphors in Idioms 


Metaphorical idioms (e.g. Wood 1986) are compositional in that one or more of their 
constituents can be interpreted as a metaphor independent and outside of the idiomatic 
context. For example, fire is a conventional metaphor and readily interpretable as a 
potential or real danger in idioms like play with fire, pull the chestnuts out of the fire, be 
in the line of fire; the same metaphor is reflected in the meaning of get burned, ‘suffer a 
setback (Lakoff and Johnson 1980). By contrast, idiom components like cat in let the cat 
out of the bag are context-specific metaphors; cat cannot be freely used to mean ‘secret 
information. While one might argue for dictionary entries for conventional metaphors 
like fire that exhibit a certain degree of distributional freedom, entries for highly con- 
text-specific metaphors like cat do not seem warranted. 


25.4.2 Variation in Idioms 


Perhaps the greatest challenge for the lexical treatment of idioms is that very few are 
completely fixed but allow for variations, often as rich as freely composed strings. 
Consequently, lexical entries casting them as frozen sequences of words, while correctly 
capturing their semantic unity, do not do justice to their usage. 

Fillmore et al. (1988) note that most idioms allow at least verbal inflection, such as 
variation in tense and number of the lexically unfilled subject. The misconception that 
idioms are frozen is probably due to the fact that much of the literature on idioms and 
collocations is based on data derived via introspection. Moon (1998) was one of the first 
comprehensive studies of English idioms based on corpora, and her data challenge a 
simple integration of idioms into any theoretical grammatical framework. Neumann 
et al. (2004) and Fellbaum (2006, 2007) examine German idioms using a one billion 
word corpus. Search queries that allow for the retrieval oflexical and syntactic variations 
of the idioms’ canonical forms (Herold 2007) return numerous examples of variations 
and demonstrate that most idioms participate in the regular grammatical processes 
associated with free language. Strikingly, the data refute the prevailing view that an idi- 
om’s variability is entirely conditional on its semantic transparency, as articulated in, for 
example, Nunberg et al. (1994). Flexibility cannot be straightforwardly accounted for in 
terms of semantic compositionality. 


5 Interestingly, idioms in other languages also encode ‘die’ with transitive verbs; cf. German den Léffel 
abgeben (‘hand in the spoon) and French casser sa pipe (‘break one’s pipe). 
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25.4.2.1 Lexical Variation 


Corpus data show that in many idioms a constituent can be exchanged for another, 
semantically related lexeme. Quite often, such substitution has a playful character and 
alludes to a specific event or state of affairs. For example, when the secret life of golf 
champion Tiger Woods was revealed, newspaper readers encountered headlines such 
as Who let the Tiger out of the Bag? Besides playing on the golfer’s name, the use of tiger 
rather than cat here implies the considerable magnitude of the scandal. Such variations 
are highly context- and situation-specific, tend to be found in the public media, and are 
usually short-lived. 

Besides substitution of the idioms’ noun components, corpus data show that adjec- 
tives are often added to the nouns (Fellbaum and Stathi 2006; Stathi 2007). Ernst (1981) 
calls ‘external modification’ cases where the adjective in fact modifies the entire idiom, 
much like an adverb, as in 


Carter doesn't have an economic leg to standon 


Many people were eager to jump on the horse-drawn Reagan bandwagon 


These examples show further that many of the attested variations of common idioms 
play on specific situations, events, and people. 

Another kind of variation, where a speaker adapts a noun phrase’s determiner or 
number, is found particularly often with metaphors and nouns that can be mapped 
onto a referent. An attested example is more than one cat was let out of the bag 
that night. 

Lexical variation extends to changes of category of the major constituents. Thus, 
Moon (1998) cites corpus data where a verb has been turned into a noun: 


lose face/loss of face 
waste one’s breath/a waste of breath 


break the ice/ice-breaker 


25.4.2.2 Syntactic Variation 


Idioms are also subject to considerable syntactic operations. Moon (1998) cites corpus 
examples of passivization of English idioms (Mary’s teeth were gnashed as the home team 
went down in defeat), relativization (That is a bullet on which the Arthur Golds of this 
world have steadfastedly failed to bite), and pronominalization (if there is ice, Mr. Clinton 
is breaking it). Other examples found on the Web include the following: 


Beyoncé has finally cozied up to the cat that was let out of the bag months and 
months ago 
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If those experiences were the apex of their lives, why wouldn't they just go on repeating 
them, over and over, until the bucket was finally kicked? 


‘The evidence clearly refutes the common claim that idioms are fixed unless their com- 
ponents can be semantically interpreted. Cases like the passivization of the seman- 
tically unanalysable bucket also call into question the notion of a continuum of 
flexibility that interacts with semantic transparency, as proposed by Abeillé (1995), 
who based her analysis of a number of French idioms, and Dobrovol’skij (1999), 
inter alia. 


25.5 CONSEQUENCES FOR THE LEXICAL 
REPRESENTATION OF MWUs 


The wide range of MWUs examined here all share the property of being semantic units; 
as such, they must be included in lexical resources.® But, given the considerable vari- 
ation of many MWUs, their representation as ‘long words, which suggests fixedness, 
seems inadequate. At the same time, it is clearly impossible to give a comprehensive 
account of the variability and its limitations. We examine three kinds of lexicons and 
their treatment of MWUs. 


25.5.1 Virtual Lexicons 


I call ‘virtual lexicons’ models of the lexicon constructed by linguists and psycholin- 
guists. One challenge for these lexicons is to account in a systematic and comprehen- 
sive way for people’s linguistic behaviour with respect to MWUs. Virtual lexicons tend 
to discuss a small number of cases and do not strive to construct large-scale resources. 
Soehn (2006), working in the Head-driven Phrase Structure Grammar (HPSG) frame- 
work, borrows the well-established lexicographic principle of cross-listing. He dis- 
tinguishes between metaphoric components of idioms and semantically opaque ones 
(called ‘listemes; following diSciullio and Williams's (1987) distinction between regular, 
class-based, and irregular, listed items in the lexicon). Constituents of decomposable 
idioms are encoded in a lexical entry together with a literal string (e.g. spill, divulge). In 
Soehn’s model of the lexicon, the string in an idiom selects its complement (beans) via its 
listeme value. 


§ There is also rich evidence that speakers store multi-word expressions as units in their mental 
lexicons, notwithstanding the fact that they can serve as input to all the grammatical processes available 
to simple words. 
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25.5.2 Traditional Paper Dictionaries 


I sample a few dictionaries focusing on collocations that specifically target learners of 
English. Benson et al’s (1986) BBI Combinatory Dictionary of English: A Guide to Word 
Combinations is an exemplary resource for learners of English. BBI lists grammatical and 
lexical collocations. Eight types of grammatical collocations are distinguished, based on 
their syntax (€.g. preposition + noun, as in by accident, at anchor and predicate adjective + 
to + infinitive, as in ready to go and difficult to convince). The patterns applying to each head- 
word are indicated by means of a code. Lexical collocations are phrases characterized not by 
their syntactic structure but by their lexical make-up, which is more or less fixed, arbitrary, 
and not predictable. Thus, BBI list the verbs commit and attempt in the entry for suicide, 
implicitly guiding the learner to avoid producing combinations like perpetrate suicide. 

The Oxford Collocations Dictionary for Students of English (2009) by Colin McIntosh 
lists for each of its 9,000 headwords the most common collocates (lexical collocations), 
with illustrative examples drawn from the two billion word Oxford English Corpus, and 
specifies the part of speech. For example, the entry for ablaze lists the verbs be and set, 
and specifies that with is the head of the prepositional phrase following ablaze (every 
window was ablaze with light). 

Boatner et al’s Dictionary of American Idioms (1975) is not based on corpus data, 
reflecting an older lexicographic tradition. It treats idiomatic MW Us as fixed long words. 
While such information can convey their meaning to a user who encounters an unknown 
expression, it does not tell him how to productively use the expression. Moreover, if the 
speaker encounters a token of the expression that includes a lexical variation, he may not 
find the MWU in its canonical form in the dictionary. Like many resources, Boatner et al. 
make use of cross-listings. Thus give up the ghost can be found under the entry for give, 
and looking up ghost refers the user back to the entire idiom. But, as in the case of paper 
dictionaries that were compiled under economic constraints, cross-listing is not consist- 
ent; thus, sell down the river is included in the entry for sell but not for river. 


25.5.3 Electronic Lexical Resources for Collocations 


Modern lexicographic resources are no longer bound to the paper format. Electronic 
resources are not only unconstrained as concerns their size but they offer the possibil- 
ity of rich cross-referencing and linking among entries and thus more points of access 
to a given wordform or meaning. Moreover, the availability of electronic corpora now 
makes corpus-based lexicography possible, reflecting a broad, varied speaker com- 
munity rather than data constructed by the lexicographer. In fact, the division between 
corpus and lexicon melts away, and the lexicographer’s task can be reduced to the anno- 
tation of corpus data rather than the crafting oflexical entries. 

An example of a corpus-based lexical resource for idioms is the German colloca- 
tion database (Fellbaum 2006, 2007). One thousand selected German verb phrase idi- 
oms were analysed by searching a one billion word corpus (Klein and Geyken 2010). 
The corpus was searched using flexible regular expressions involving wordforms 
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characteristically, although not necessarily, associated with a given idiom (e.g. German 
Gras, beissen, lit. ‘grass, ‘bite’ corresponding to English bucket, kick in the idiom ins 
Gras beissen). Corpus tokens showed that idioms that are traditionally represented as 
fixed strings in fact exhibit a rich variety that extends not only to syntax (passivization, 
focusing, etc.) but also to lexical substitution and modification (see Section 25.4.2). 
The corpus examples were manually sorted and classified according to their lexical 
and morphosyntactic signatures. An online interface allows the user to search for par- 
ticular expressions and all variations with illustrative corpus examples that provide an 
impression of the idioms’ flexibility, although no exhaustive account of their full use can 
be given. 

The online Oxford Collocation Dictionary for Advanced Learners of English (ozdic. 
com) gives rich information about a word's collocational properties grouped by part of 
speech. For example, the entry for heart + Verb lists many expressions, including idi- 
osyncratic combinations and idioms, with corpus examples: 


jump, leap, lurch, miss/skip a beat Her heart leapt with joy. | ache My heart aches when 
I think of their sorrow. | desire sth everything your heart could desire | sink | go out Our 
hearts go out to (= we sympathize deeply with) the families of the victims. 


The Oxford Collocation Dictionary for Advanced Learners of English includes a sepa- 
rate category PHRASES. Looking up hand and ear, one finds fall into the wrong hands 
and fall on deaf ears, respectively. But even digital resources do not offer comprehensive 
cross-listings. 


25.5.4 Corpus-based Dynamic Resources 


Corpus data not only aid learners by giving them direct access to attested examples, 
but are important in providing lexicographers with representative data on which to 
base a lexical entry. The Sketch Engine (Kilgarriff et al. 2004) is an important tool that, 
for a given target word, returns frequent and prototypical examples from several cor- 
pora. Lexicographers can choose and add corpora so that the resource has a dynamic 
character. 

The Sketch Engine provides ‘sketches’ of the keyword, consisting of corpus lines 
(key-word-in-context, or KWIC lines). It provides a frequency score, based on the cor- 
pus, of words that co-occur with the keywords. Moreover, the collocates are classified 
in terms of their syntactic relation to the keyword (modifier, modified-by, object-of, 
etc.). Importantly, the Sketch Engine allows one to compare several words with respect 
to their frequent collocates. This reflects both the uniqueness of a given word in an 
MWU-~and hence the fixedness of the expression—and the substitutability of similar 
lexemes in a collocation. Unlike the frozen, stative database of the German collocation 
project, the Sketch Engine allows the user to create a customized corpus from selected 
corpora and corpus examples. The larger the corpus, the more likely the inclusion of less 
prototypical and infrequent variations, which tend not to be found in resources where 
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the lexicographer necessarily has to omit marginal examples. ‘Thus, resources like the 
Sketch Engine potentially expose the user to the full range of variation of an MWU. 

Several resources were built with Sketch Engine corpus query technology. One is Dante 
(Database of Analysed Texts of English, <http://www.webDante.com>), a lexicon based 
on a17 billion word corpus that gives fine-grained descriptions for tens of thousands of 
English idioms, phrasal verbs, and compounds. For example, the entry for the keyword 
benchmark contains the Light Verb Construction (called ‘chunk in Dante) set a benchmark. 

An improvement on static lexical resources like the German collocation dictionary 
is the automatic collocation dictionary ForBetterEnglish.com (Kilgarriff et al. 2008). A 
user can enter a word into the Web-based interface and ForBetterEnglish returns the 
most frequent collocates, based on large corpora, each with the particular syntactic 
configuration. Thus, the query disaster returns the information that, as an object, this 
word occurs most frequently with the verbs avert, spell, loom; as a subject, is is most 
often followed by the verb strike. Its most frequent modifiers are natural, humanitar- 
ian, unmitigated, man-mde, impending, ecological. Disaster compounds most frequently 
with recovery, preparedness, relief as a modifier and with tsunami, earthquake, mining, 
and WTC as the head. For each collocation, a representative corpus example is given. 

Electronic, corpus-based lexicography opens up new possibilities for a more compre- 
hensive representation of speakers language. Not only does it allow access to the language 
of a broad swathe of a linguistic community and no longer rely on the introspection and 
idiolects of a handful of lexicographers, but it can display the full spectrum of syntactic and 
lexical variations that must be accounted for when one tries to describe the lexicon as a 
complex component ofour linguistic knowledge. 


25.6 SUMMARY AND CONCLUSION 


The typology of multi-word collocations and idioms presented here shows that those 
that exhibit a certain degree of syntactic and semantic idiosyncracy and thus deserve 
inclusion in a lexical resource do not easily fall out. As lexical items, many are only 
partially filled and allow of considerable variation; this is true even for semantically 
non-compositional idioms that are subject to all regular grammatical processes. 

Their status as lexical units on the one hand, and their considerable flexibility on 
the other hand make many MWUs a challenge for lexical treatment. There are no 
easy solutions for an optimal way to to represent MWUs in a way that could inform a 
speaker—and especially a learner—about the full range of their use. The challenge of 
representing not only their meaning but also the extent and the limits of their usage 
entails crossing the traditional boundary between the lexicon and the morphsyntactic 
component of speakers’ grammatical knowledge. On the basis of some specific cases, 
it was shown how some leading lexical resources try to meet this challenge; each nec- 
essarily has some deficiencies. Resources that combine corpus data with lexicographic 
descriptions carry great promise for doing justice to speakers’ rich use of MWUs. 
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26.1 INTRODUCTION 


In the canonical compartmentalization of linguistics, lexicography belongs to the 
domain of applied linguistics. Its theoretical counterpart, then, would be lexicology, 
and specifically with regard to the description of meaning, lexical semantics. But lexi- 
cography is not always a direct application of a lexicological or a semantic theory, and 
to the extent that lexical semantic theory has actually played a role, the plurality of 
theories of word meaning needs to be taken into account. This chapter, then, takes 
a historical perspective: for each of the major theoretical approaches to word mean- 
ing that have successively emerged in the course of the last 150 years, we will consider 
what their contribution to lexicography has been. Adopting a considerably simplified 
version of the classification in Geeraerts (20104), three main stages in the history of 
lexical semantics will be distinguished: historical-philological semantics, structural- 
ist and neostructuralist semantics, and cognitive semantics. In each of the sections 
devoted to these three theoretical frameworks, the first subsection contains a short 
general introduction (for more detail the reader is referred to Geeraerts 20104), while 
the second subsection describes the lexicographical connections. The fourth and final 
section of the chapter reviews the overall evolution and explores possible lines for 
future development. 
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26.2 HISTORICAL-PHILOLOGICAL 
SEMANTICS 


26.2.1 A Characterization of Historical-Philological 
Semantics 


‘The first stage in the history of lexical semantics as an academic discipline runs from 
roughly 1830 to 1930. Its dominant characteristic is the historical orientation of lexical 
semantic research: its main concern lies with changes of word meaning—the identifica- 
tion, classification, and explanation of semantic changes. The overall characteristics of 
this approach, which is iconically represented by such major figures as Bréal (1897) and 
Paul (1920), can be introduced from a descriptive and from a theoretical perspective. 

Descriptively speaking, classifications of semantic change are the main empiri- 
cal output of historical-philological semantics, and an in-depth study of the 
historical-philological era would primarily take the form of a classification of such clas- 
sifications. Characteristically, the classificatory efforts of historical-philological seman- 
tics do not stop at the level where we find well known high-level phenomena such as 
metaphor and metonymy, but they also search for more specific patterns and mecha- 
nisms of semantic development. While Carnoy’s (1927) and Stern’s (1931) classifications 
of semantic changes represent the final stage of historical-philological semantics, it is 
typical that in systems like Stern’s and Carnoy’s, the classificatory depth is considera- 
ble: basic categories are divided into sub-classes, which may then be divided into fur- 
ther sub-classes, and so on, almost ad infinitum. As a consequence, works like Carnoy 
(1927) and Stern (1931), but also Nyrop (1913) or Waag (1908), remain copious treasures 
of examples for anyone interested in processes of semantic change: regardless of the 
classificatory framework they employ, the wealth of examples amassed in these works 
continues to amaze. 

Theoretically, the dominant conception of meaning of historical-philological seman- 
tics may be characterized in two ways. First, it is a psychological conception of meaning, 
in a double sense. Lexical meanings are considered to be psychological entities, that is 
to say, (a kind of) thoughts or ideas. Further, meaning changes are explained as result- 
ing from psychological processes; the general mechanisms that are supposed to under- 
lie semantic changes, and whose presence can be established through the classificatory 
study of the history of words, correspond with patterns of thought of the human mind. 
A concept like metonymy, for instance, is not just a linguistic concept, it is also a cogni- 
tive capacity of the human mind. The second important feature of the theoretical posi- 
tion of historical-philological semantics is the importance it attaches to the contextual 
flexibility of meaning: meanings change over time, but in order to explain that change, 
we need to take into account how meanings change in specific contexts. (These features, 
as we shall see later, are quite important when we review the overall evolution of lexical 
semantics.) 
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26.2.2 The Lexicographic Impact 
of Historical-Philological Semantics 


The relationship between historical-philological semantics and lexicography is a close 
one, as can be readily appreciated from the opening statement of Hecht’s Griechische 
Bedeutungslehre (1888): 


Insofern sie [ie. die Bedeutungslehre] zugunsten der Lexikographie die 
Bedeutungen in zeitlicher Folge ordnet und im Interesse der Etymologie die Gesetze 
der Bedeutungsanderung aufstellt, hat sie sprachwissenschaftlichen Wert. Soweit 
sie aber diese Gesetze aus der Natur des Geistes herleitet und eine Geschichte der 
Vorstellungen gibt—Bedeutungen sind Vorstellungen -, fallt sie auf das Gebiet der 
empirischen Psychologie. 


([Semantics] is linguistically valuable to the extent that it chronologically classi- 
fies meanings in the interest of lexicography, and writes down the laws of seman- 
tic change in the interest of etymology. To the extent, however, that it derives these 
laws from the nature of the mind and that it writes a history of ideas—meanings are 
ideas—, it falls within the realm of empirical psychology.) (Hecht 1888: 5~6) 


This quotation illustrates the features sketched in Section 26.2.1: diachronic semantics 
is concerned with the classification of mechanisms of semantic change, and in doing so 
assumes a psychological conception of meaning, one in which the linguistic phenomena 
under study are seen as revealing characteristics of the human mind. But the quotation 
also reveals the symbiotic relationship between historical-philological semantics and 
historical lexicography: the classification of mechanisms of semantic change is not just 
important in itself, as a branch of historical linguistics, but it has practical importance 
for diachronic lexicography. 

And historical dictionaries, to be sure, constituted the top category of nineteenth- 
century lexicography. The nineteenth century in fact witnessed the birth of the large- 
scale descriptive dictionary on diachronic principles, that is, the historical dictionary 
that intended to chart the development of the language from the earliest origins to the 
present day. Major examples include the Deutsches Wérterbuch (started by Jacob and 
Wilhelm Grimm; 1854~1954), the Dictionnaire de la langue francaise (by Emile Littré; 
1877), the Oxford English Dictionary (James Murray et al.; 1884-1928), and—the larg- 
est dictionary in the world by any count—the Woordenboek der Nederlandsche Taal 
(started by Matthias de Vries in 1864; finished 1998). (See further Considine, this vol- 
ume.) Crucially, these grand historical dictionary projects derive from the same con- 
cern as diachronic lexical semantics: a fascination with the correct description of the 
historical development of words and meanings. As such, these dictionaries were not just 
situated at the receiving end of historical-philological theory formation: by providing 
the diachronic semantician with examples of lexical changes, drawn from a corpus of 
historical texts on which to compile the dictionary, they also contributed to the theoreti- 
cal endeavours. 
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The relationship between historical-philological semantic theory and diachronic lex- 
icography is a balanced one, in other words: both contribute to each other. On a more 
mundane level, this intellectual link between theoretical semantics and lexicographical 
practice shows up in the fact that a number of important theoreticians in the field of 
diachronic lexical semantics were at the same time the editor of a major dictionary: Paul 
compiled a Deutsches Wérterbuch (1897), and Darmesteter (author of a successful trea- 
tise on ‘the life of words, 1887) co-edited a Dictionnaire général de la langue francaise 
(Darmesteter and Hatzfeld 1890). 


26.3 STRUCTURALIST 
AND NEOSTRUCTURALIST SEMANTICS 


26.3.1 A Characterization of Structuralist 
and Neostructuralist Semantics 


Taking its inspiration from the structuralist conception of language that is basically 
associated with the work of Ferdinand de Saussure, structuralist lexical semantics is the 
main inspiration for innovation in word meaning research from the 1930s until well into 
the 1960s. The central idea is the notion that language has to be seen as a system, and not 
just as a loose bag of words, and in addition, that such a system is primarily a synchronic 
and not a diachronic phenomenon. Among the large variety of theoretical positions 
and descriptive methods that emerged within the overall lines set out by a structuralist 
conception of meaning, four broad strands may be distinguished: lexical field theory, 
componential analysis, relational semantics, and distributional semantics. Since these 
approaches are fairly well known, a few words may suffice to introduce them. 

Lexical field theory as introduced by Weisgerber (1927) and Trier (1931) takes its 
starting-point in the structuralist view that language constitutes an intermediate con- 
ceptual level between the mind and the world, and translates that view into the meta- 
phorical notion of a lexical field: if you think of reality as a space of entities and events, 
language so to speak draws lines within that space, dividing up the field into conceptual 
plots. A lexical field, then, is a set of semantically related lexical items whose meanings 
are mutually interdependent and that together provide conceptual structure for a cer- 
tain domain of reality. In practical terms, this translates into the requirement that words 
naming kitchen utensils, or kinship terms, or verbs of movement (to name just a few 
possibilities) should not be described in isolation, but should be investigated as a set of 
interdependent items. 

Componential analysis is a logical development from lexical field theory: once 
you have demarcated a lexical field, the internal relations within the field have to be 
described in more detail, Componential analysis is a method for describing such oppo- 
sitions that takes its inspiration from structuralist phonology: just like phonemes 
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are described structurally by their position on a set of contrastive dimensions, words 
may be characterized on the basis of the dimensions that structure a lexical field. 
Componential analysis was developed in the second half of the 1950s and the begin- 
ning of the 1960s by European structuralist linguists (Coseriu 1962; Pottier 1964) as well 
as American anthropological linguists (Conklin 1955; Goodenough 1956; Lounsbury 
1956), although largely independently of each other. Its major impact came not from 
its structuralist or anthropological background, but from its incorporation into gen- 
erative grammar: the appearance of Katz and Fodor’s seminal article “The structure of 
a semantic theory’ (1963) introduced a formalized type of componential analysis into 
formal grammar. 

Relational semantics further develops the idea of describing the structural relations 
among related words, but restricts the theoretical vocabulary that may be used in such a 
description. In a componential analysis, descriptive features like gender and generation 
in a system of kinship vocabulary are real-world features; they describe the real-world 
characteristics of the things the words refer to. But structuralism is interested in the 
structure of the language rather than the structure of the world outside of language, and 
so it may want to use a different type of descriptive apparatus, one that is more purely 
linguistic. Relational semantics, as introduced by Lyons (1963), looks for such an appa- 
ratus in the form of sense relations like synonymy and antonymy: the fact that aunt and 
uncle refer to the same genealogical generation is a fact about the world, but the fact that 
black and white are opposites is a fact about words and language, and as such, it is the 
preferred perspective for a structuralist approach to semantics. (On approaches to syn- 
onymy and antonymy see also Murphy, this volume.) 

Distributional semantics takes its starting-point in the syntagmatic relations that 
words entertain, in contrast with the previous approaches, which start from para- 
digmatic relations (like the similarity of words in a field). One straightforward way 
of introducing syntagmatic relations into lexical semantics (pioneered by Porzig 
1934) is simply by describing the combinatorial potential of words, such as the fact 
that blonde is typically predicated of hair, or beer. But syntagmatic information may 
also play a methodological role: if the environments in which a word occurs could 
be used to establish its meaning, structuralist semantics could receive a firm meth- 
odological basis. The general approach of such a distributionalist method is sum- 
marized by Firth’s famous dictum: ‘You shall know a word by the company it keeps’ 
(1957b: 11). A similar assumption is expressed by the ‘distributional hypothesis’ as 
formulated by Harris (1954): words that occur in the same contexts tend to have 
similar meanings. 

The main types of structuralist semantics as listed here were developed between the 
early 1930s and the late 1960s, but structuralist theorizing did not come to a halt at that 
point. From the 1970s up to the present day, many approaches were developed that took 
at least part of their inspiration from structuralist thinking, but that often combined 
it with an additional interest: an interest in formalization, or an interest in questions 
of the type that are central to cognitive semantics (to be discussed in Section 26.3). 
Approaches aiming at some form of logical or symbolic formalization, such as 
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Jackendoff’s Conceptual Semantics (1972) and Pustejovsky’s Generative Lexicon (1995), 
are less important in the context of the present chapter. (On Pustejovsky’s approach see 
further Hanks, this volume.) Formal approaches in general do have links with compu- 
tational semantics (the description of meaning in natural language in the context of 
computational linguistics), and hence with the compilation of machine-readable dic- 
tionaries as components of NLP systems, but that is a form of lexicography that falls 
outside the scope of this chapter. More interesting for general lexicography are the neo- 
structuralist theories that have specific links with general lexicography; these are dis- 
cussed in the next subsection. 


26.3.2 The Lexicographic Impact of Structuralist and 
Neostructuralist Semantics 


Structuralist approaches had an overall impact on dictionary making in the sense that 
they heightened the awareness of general lexicographers for specific types of infor- 
mation in the dictionary. For instance, while synonyms are clearly a time-honoured 
element of lexicography, structuralist thinking would provide a stimulus for a more sys- 
tematic incorporation of synonyms and near-synonyms. This impact of structuralism is 
reflected in the fact that Zgusta (1971), the first influential textbook on lexicography, was 
firmly couched in a structuralist way of thinking. Zooming in on this general and per- 
vasive influence, we may now have a look at the way in which the different theoretical 
forms of structuralism (as reviewed in the previous subsection) led to specific sugges- 
tions and projects for lexicography. 

Lexical field theory led to renewed attention in lexicography for onomasiological 
dictionaries, that is, reference works that organize vocabularies not on an alphabeti- 
ca] basis, but on the basis of the semantic association between words, like thesauruses 
and synonym dictionaries, Such onomasiologica} dictionaries have a long pedigree in 
practical lexicography (see for instance HiiJlen 1999), but in the structuralist era they 
received specific attention in theoretical lexicography, and new thematically organized 
dictionaries were developed. A selection of the relevant literature includes Dornseiff 
(1959) as an example of an actual dictionary project, and Hallig and von Wartburg 
(1952), Glinz (1954), Von Wartburg (1957), and Baldinger (1960) as examples of the theo- 
retical reflection triggered by structuralist semantics. Compare also the work leading up 
to the Historical Thesaurus of the Oxford English Dictionary, and further discussion in 
Kay and Alexander (this volume). 

Currently componential analysis is probably most influential in the neostructural- 
ist formulation it receives in Wierzbicka’s Natural Semantic Metalanguage approach 
(Wierzbicka 1985; Goddard 1998b). This is an informal decompositional approach to 
the description of meaning in natural languages that assumes an allegedly universal 
set of about sixty primitive concepts. (For more detail on this approach see Hanks, this 
volume.) As such, it presents an alternative to the classical componential approach: it 


LEXICOGRAPHY AND THEORIES OF LEXICAL SEMANTICS 431 


abandons the idea that meaning components derive from distinctive oppositions within 
a lexical field, but rather assumes that there exists a universal set of semantic primi- 
tives that may be discovered by defining words through a process of reductive para- 
phrase. These universal primitives constitute the definitional language of the Natural 
Semantic Metalanguage framework. Because the primitives are assumed to be univer- 
sal, definitions in the Natural Semantic Metalanguage framework are claimed to be 
cross-linguistically intelligible. The theoretical foundations of the framework are not 
beyond discussion, though (Geeraerts 1999), and the practical value of the compli- 
cated definitions produced in a reduced language of sixty-plus elements may also be 
questioned. 

Relational semantics is lexicographically most conspicuous in the WordNet pro- 
ject. WordNet is a practical application of the concept of sense relations: it provides 
a lexical database for English and a growing number of other languages organ- 
ized according to sense relations; see Miller and Fellbaum (2007). In the WordNet 
database, nouns, verbs, adjectives, and adverbs are grouped into sets of synonyms; 
these synonym sets (commonly known as synsets) and the lexical items they con- 
tain are mutually linked by means of sense relations. As WordNet provides a freely 
accessible, large-scale database for English and other languages, it is widely used in 
computational linguistics as a resource for lexical information. At the same time, it 
is subject to a number of restrictions (which the developers are ready to acknowl- 
edge): the subtler distinctions among the elements of a synset are beyond the scope 
of the description, syntagmatic relations are not envisaged, and at least in some 
cases the set of sense relations could be refined (specifically, there is no differentia- 
tion between different types of antonyms). (Compare also discussion of WordNet 
in Hanks, this volume.) 

The Meaning-Text theory developed by Meléuk (1988; Meltuk et al. 1995) con- 
stitutes a lesser known but no less interesting variant of the relational approach. 
Semantic relations may in fact be defined more broadly than in the highly selective 
approach introduced by Lyons. The observation that the person in charge of a fac- 
ulty is called a dean would not normally lead to postulating a lexical relation ‘head 
of’ between faculty and dean. However, the same relation exists between board and 
chairman, ship and captain, airplane and captain, school and headmaster or director, 
army and general, company and CEO, tribe and chief, and a number of other lexical 
sets. In the Meaning-Text theory, frequently occurring relations of this type are iden- 
tified as lexical functions. Their descriptive scope is not restricted to the relationship 
between lexical items, because similar relations may also pertain to the domain of 
morphology and phraseology: the relationship between city and urban, is the same 
as that between function and functional, and the same function that links joy to joy- 
fully also yields with joy. The Meaning-Text theory now distinguishes more than sixty 
lexical functions. They occupy a central position in the Explanatory Combinatorial 
Dictionary (Mel¢uk et al. 1984-99) that is the main practical achievement of 
Meaning-Text theory. 
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Distributional semantics seems, with hindsight, to be the structurally inspired 
approach with the biggest impact on contemporary lexicography. In the final decades 
of the twentieth century, major advances in the distributional approach to semantics 
were in fact achieved by applying a distributional type of meaning analysis to large 
text corpora, in the service of a practical lexicographical project. John Sinclair, who is 
the pioneer of the approach, developed his ideas (see Sinclair 1991) through his work 
on the Collins Cobuild English Language Dictionary (1987), for which a 20 million cor- 
pus of contemporary English was compiled. An example (taken from Stubbs 2001: 15) 
may illustrate the basic idea. A classic example of homonymy in English is the item 
bank, which is either a financial institution, or an area of sloping ground, specifically 
the raised ground on the side of the river or underneath a shallow layer of water. The 
sets of words that these two exemplars of bank normally occur with hardly any over- 
lap. Looking at compounds, on the one hand, and, on the other hand, at co-occurring 
items within a few words to the left or right of bank, Stubbs comes up with the follow- 
ing lists: 


bank account, bank balance, bank robbery, piggybank cashier, deposit, financial, 
money, overdraft, pay, steal 


sand bank, canal bank, river bank, the South Bank, the Left Bank, Dogger bank, Rockall 
Bank, Icelandic Banks cave, cod, fish, float, headland, sailing, sea, water 


The entities in the environment of the two homonyms appear to differentiate efficiently 
and effectively between the two meanings, and in that sense, a systematic analysis of the 
co-occurring items would appear to be an excellent methodological ground for lexical- 
semantic analysis. In theoretical terms, the essential concept here is that of collocation, 
defined as ‘a lexical relation between two or more words which have a tendency to co- 
occur within a few words of each other in running text’ (Stubbs 2001: 24). 

In Sinclair’s original conception, a collocational analysis is basically a heuristic 
device to support the lexicographer’s manual work. (See further discussion in Hanks, 
this volume.) A further step in the development of the distributional approach was 
taken through the application of statistics. A decisive step was taken when Church 
and Hanks (1990), working in the context of Sinclair's Cobuild project, introduced 
the Pointwise Mutual Information index (defined in terms of the probability of occur- 
rence of the combination x,y compared to the probabilities of x and y separately) as a 
statistical method for establishing the relevance of a collocation. Once a statistical meas- 
ure of association in the form of the Pointwise Mutual Information index was intro- 
duced, further possibilities opened up for quantifying the distributional approach. For 
instance, a whole variety of association measures has been suggested and researched, 
among which Dunning’s log-likelihood ratio (1993) is one of the more popular ones. 
Also, the statistical turn in thinking about contextual distributions allowed for a rap- 
prochement with the field of information retrieval and Natural Language Processing 
where so-called word space models constitute an advanced form of distributional corpus 
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analysis, applied to problems like word sense disambiguation and synonym extraction 
(see Agirre and Edmonds 2006a). 

With the evolution towards a statistical, corpus-based methodology, distributional 
semantics has moved outside the realm of structuralist semantics, not just in the sense 
that it has morphed from a theory to a method, but also in the sense that a corpus 
approach shifts the structuralist focus on the linguistic system towards a focus on actual 
language usage as it may be encountered in large repositories of spontaneous language 
use—corpora, in other words. This shift has significant consequences for lexical seman- 
tic theory, as will be suggested in the final section. 


26.4 COGNITIVE SEMANTICS 


26.4.1 A Characterization of Cognitive Semantics 


Cognitive semantics emerged in the 1980s as part of cognitive linguistics, a loosely struc- 
tured theoretical movement that opposed the autonomy of grammar and the secondary 
position of semantics in the generativist theory of language. In contrast with structur- 
alist semantics, cognitive semantics takes a usage-based rather than a system-based 
approach to the description of meaning: what is to be described is not a relatively sta- 
ble semantic structure that is part of a more or less autonomous linguistic system, but 
the variable, contextualized use of language as it is embedded in human experience and 
interaction, Whereas structuralist semantics is typically looking for those aspects of 
meaning that specifically belong to the linguistic system, cognitive semantics embraces 
a maximalist position: one in which the distinction between semantics and pragmatics 
is not a major issue, in which language is seen in the context of cognition at large, and in 
which language use is the methodological basis of linguistics. 

In this subsection, three lexicographically relevant contributions of cognitive seman- 
tics to the study of word meaning will be briefly presented: the prototype model of cate- 
gory structure, the conceptual theory of metaphor and metonymy, and frame semantics. 
Introductions to cognitive linguistics at large include Evans and Green (2006), and 
Kristiansen et al. (2006); Geeraerts and Cuyckens (2007) is a multi-authored handbook. 

Prototype theory (see Taylor 2003 for an introductory text) as meant here is a cover 
term for the various aspects of a cognitive semantic view on the structure of lin- 
guistic categories. That view assumes that lexical concepts are repositories of world 
knowledge: the traditional structuralist distinction between linguistic semantics and 
encyclopaedic concepts cannot be upheld in a strict manner. Accordingly, rich, ‘ency- 
clopaedic’ forms of description will not be shunned. Further, prototype theory assumes 
that conceptual knowledge need not necessarily take the form of abstract definitional 
knowledge about a given category, but may also reside in knowledge about the mem- 
bers of the category: our knowledge of what birds are in general may at least to some 


434 DIRK GEERAERTS 


extent be based on what we know about (typical) birds. This means that extensional 
forms of description will also be natural from a prototype-theoretical point of view. In 
addition, prototype theory highlights the fact that lexical polysemy takes the form ofa 
multidimensional structure of semantic extensions starting from central readings, that 
is, categories are characterized by salience effects, in the sense that some readings have 
a stronger weight than others, and by multiple relations among those readings. Finally, 
prototype theory emphasizes that semantic structures may be fuzzy, in the sense that it 
may not always be easy to distinguish one meaning from the other. 

Conceptual metaphor and metonymy involve the observation that in a given language, 
metaphors and metonymies often occur in groups expressing the same underlying idea. 
The concept of anger, to name one of the best known examples, is often expressed by 
lexical metaphors involving heat, and more specifically heated fluids: to reach boiling 
point, to seethe with rage, to let off steam, to be a hothead, to fume, to be scarlet with rage, 
to explode with anger, to breathe fire, to make inflammatory remarks, to boil with anger. 
In Conceptual Metaphor Theory as introduced by Lakoff and Johnson (1980), such 
recurrent patterns are captured by labels like ANGER Is THE HEAT OF A FLUID IN A CON- 
TAINER. Such sets of expressions (one might call them ‘figurative lexical fields’) illus- 
trate the ‘cognitive’ aspect of cognitive semantics: rather than conventional expressions, 
conceptual metaphors and metonymies are patterns of thought that range across the 
lexicon. 

Frame semantics (Fillmore 1977; Fillmore and Atkins 1992, 1994, 2000) is the most 
articulate model with which cognitive semantics implements the idea that our knowl- 
edge of the world is organized in larger ‘chunks of knowledge’, and that language can 
only be properly understood against the background of that world knowledge. Frame 
theory is specifically interested in the way in which language may be used to perspec- 
tivize an underlying conceptualization of the world: it’s not just that we see the world 
in terms of conceptual models, but those models may be verbalized in different ways. 
To illustrate, we may have a look at the standard example of frame theory, the com- 
MERCIAL TRANSACTION frame. The commercial transaction frame involves words like 
buy and sell. The commercial transaction frame can be characterized informally by a 
scenario in which one person gets control or possession of something from a second 
person, as a result of a mutual agreement through which the first person gives the sec- 
ond person a sum of money. Background knowledge involved in this scenario includes 
an understanding of ownership relations, a money economy, and commercial contracts. 
The categories that are needed for describing the lexical meanings of the verbs linked 
to the commercial transaction scene include Buyer, Seller, Goods, and Money as basic 
categories. Verbs like buy and sell then each encode a certain perspective on the com- 
mercial transaction scene by highlighting specific elements of the scene. In the case of 
buy, for instance, the buyer appears as the subject of the sentence and the goods as the 
direct object; the seller and the money appear in prepositional phrases: Paloma bought a 
book from Teresa for €30. In the case of sell on the other hand, it is the seller that appears 
as a subject: Teresa sold a book to Paloma for €30. 
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26.4.2 ‘The Lexicographic Impact of Cognitive Semantics 


The impact of cognitive semantics on lexicography takes diverse forms: frame seman- 
tics is implemented in a large-scale electronic dictionary project, research into con- 
ceptual metaphor and metonymy has given rise to suggestions for a new treatment 
of figurative language, and prototype theory has engendered a fundamental recon- 
sideration of the concept of polysemy. Let us consider these three developments in 
that order. 

Frame semantics is lexicographically important because it is being implemented 
on a large descriptive scale in the Berkeley FrameNet project (Johnson et al. 2002; 
Ruppenhofer et al. 2006). Making systematic use of corpus materials as the main source 
of empirical evidence for the frame-theoretical analyses, FrameNet offers an electronic 
dictionary with frame-theoretical descriptions, similar in purpose and ambition to 
WordNet, but starting from a different descriptive framework: the Berkeley FrameNet 
project does for frame semantics what WordNet does for structuralist lexical relations, 
that is to use the model for building an online lexical database. 

Conceptual metaphor and metonymy studies suggest ways of dealing with the links 
between the senses of lexical items that go beyond common dictionary practice. 
Swanepoel (1992, 1998) and Van der Meer (2000, 2005), for instance, argue for devot- 
ing more explicit attention to the motivational link between core senses and figurative 
subsenses. Such motivational links could specifically involve conceptual metaphors in 
the Lakovian sense, or even image schemas, In the metalexicographical reflection on 
pedagogical lexicography, this suggestion has been developed by, among others, Moon 
(2004), Adamska-Sataciak (2006), and Wojciechowska (2012). In the actual practice of 
lexicography, the Macmillan English Dictionary for Advanced Learners (2007, first pub- 
lished in 2002) incorporates ‘metaphor boxes’ showing, in a Lakovian vein, the concep- 
tual metaphors behind common expressions. 

At this point, we may also mention the influence of cognitive linguistics on a very spe- 
cific subdiscipline of lexicography, namely, on terminography, the study and description 
of professional and scientific terminology. Temmerman (2000) shows that the tenets of 
Wiister’s highly influential Vienna school of terminography (which is firmly based on 
structuralist principles) do not hold out when confronted with the way in which con- 
cepts are developed and terms applied in actual professional and scientific discourse. 
Temmermans analysis of biotechnological terminology demonstrates that all the lexi- 
cal and semantic phenomena that cognitive linguistics focuses on (like structured poly- 
semy, metonymy, and metaphor) occur in specialized terminologies just as much as in 
the general vocabulary. Importantly from the present point of view, she suggests how 
these insights into the structure and function of specialized terminologies may lie at 
the basis of new descriptive practices in terminography. This cognitive turn has in the 
meantime taken a firm footing in terminography: see Faber (2012). 

Prototype theory and, more generally, the cognitive linguistic view of polysemy and 
categorial structure pose a challenge for a structuralist understanding of the lexicon. 
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In an approach that focuses on senses as part of a more or less autonomous linguistic. 
system, some of the phenomena that are typically highlighted by cognitive semantics are 
indeed out of place. 

As a first example, cognitive linguistics takes its starting-point in the idea that mean- 
ing in language cannot be separated from the other cognitive capacities of humans, that 
is, from their experience and knowledge in a broader sense. It follows that the distinc- 
tion between semantic information and encyclopaedic information becomes less clear 
than whata structuralist approach posits: references to typical examples ofa category, or 
to characteristic features that do not necessarily apply to all the members of a category, 
are a natural thing to expect in dictionaries according to cognitive semantics. But from 
a structuralist perspective, these would be suspect intrusions of non-linguistic know1- 
edge. (See also the discussion in Hanks, this volume.) 

As a second example, cognitive linguistics suggests that it may be difficult to find a 
coherent set of criteria for establishing polysemy, and that accordingly, the distinction 
between the various senses ofa lexical item is to some extent a flexible and context-based 
phenomenon. Dictionaries, then, are likely to use various definitional techniques to 
accommodate the flexibility of meaning. In other words, if cognitive semantics is right 
in suggesting that the description of meaning has to come to terms with fuzziness, 
demarcation problems, and non-uniqueness, we can expect dictionary definitions to 
use definitional methods that take into account these characteristics, such as enumera- 
tions, disjunctions, and the accumulation of near-synonyms. Again, from a structuralist 
perspective, such definitional techniques would be considered imperfect, in compari- 
son with a ‘strictly linguistic type of definition. 

Now, it does not require an extraordinary familiarity with actual dictionaries to 
observe that such features (enumerations, disjunctions, reference to exemplars, etc.) 
do indeed occur. The point to make is rather that from a cognitive semantic perspec- 
tive these aspects of dictionaries are the natural consequence of the nature of semantic 
phenomena, rather than being imperfections that need to be improved. As such, it has 
been argued that prototype theory and the cognitive semantic conception of catego- 
rization constitutes a theoretical framework that fits lexicography much better than 
a structuralist system-oriented approach (Geeraerts 1990, 2007; Hanks 1994; and see 
also Kilgarriff 1997b). In other words, the impact of prototype theory on lexicography 
is to provide a suitable theoretical framework for an existing practice, rather than to 
change it radically. 


26.5 PROSPECTS 


The historical panorama of the links between theories of lexical semantics and lexicogra- 
phy shows that with each of the major stages in the development of theoretical research, 
newly formed theories provided practical lexicography with specific suggestions, 
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models, and initiatives. Does this active exchange mean, however, that we generally have 
a symbiotic interaction of the type that we described for the historical-philological stage 
in the development of lexical semantics? In general, that is not the case, because a sym- 
biosis presupposes a close, two-way interaction, one in which dictionary projects work 
within a clearly defined theoretical framework, and in which the theoretical concepts 
are at the same time permanently tested and refined through a large-scale descriptive 
effort. Although we have mentioned a few projects going in that direction, such a close 
interaction is generally not the case. On one side, most dictionary projects are guided 
by functional (and financial) rather than theoretical considerations. On the other side, 
theoretical lexical semantics all too often operates on a restricted descriptive basis in 
which the analysis of choice examples and case studies takes precedence over a confron- 
tation with big data. 

At the same time, if we look at the broader pattern of the evolution sketched above, it 
does seem that we are moving into a constellation in which a higher degree of cross-fer- 
tilization is within reach. There are two points of convergence. On the theoretical side, 
as we suggested in the previous section, a cognitive semantic conception of word mean- 
ing seems to be more congenial to typical lexicographical data than a structuralist one. 
Specifically, defining linguistics as a usage-based enterprise requires a closer scrutiny 
of actual usage— precisely the kind of massive descriptive endeavour that defines lexi- 
cography. On the methodological side, major advances in corpus linguistics—not just 
in the mere presence of abundant usage data, but specifically also in the form of tools 
for digitally exploring those data—support data-driven lexicography and usage-based 
semantics alike. 

In fact, the most recent developments in cognitive semantics precisely involve 
the adoption of the distributional corpus tools whose emergence was sketched 
above: younger scholars within cognitive linguistics (as witnessed by the contribu- 
tions in Gries and Stefanowitsch 2006 or Glynn and Fischer 2010) are combining 
the theoretical framework of cognitive semantics with the descriptive methods of 
quantitative corpus analysis. The corpus approach is attractive because it provides an 
unparalleled empirical basis for lexical research. The wealth of data contained in the 
corpora—regardless of the perspective from which they are analysed—will simply 
benefit any research endeavour in lexical semantics, cognitive semantics no less so 
than other approaches. But more specifically and more importantly, there is a cer- 
tain theoretical affinity between cognitive semantics and the distributional analy- 
sis of corpus data. Both approaches are explicitly usage-based ones. It is difficult to 
see how cognitive semantics can live up to its self-declared nature as a usage-based 
model if it does not start from actual usage data and a methodology that is suited 
to deal with such data. And at the same time, distributional corpus analysis takes a 
radical usage-based rather than system-based approach: it considers the analysis 
of actual linguistic behaviour to be the ultimate methodological foundation of lin- 
guistics. In the linguistic climate of the 1970s, when the scene of grammatical theory 
was dominated by the introspective methodology of Chomskyan linguistics, such a 
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usage-based approach went against the grain of the prevalent opinions, but with the 
advent of cognitive linguistics as an explicitly usage-based approach, the perspective 
definitely changes. 

Are we looking forward to a future, then, in which a constant confrontation with the 
facts of linguistic usage draws lexicography and lexical semantics together? The pros- 
pects definitely look good. 
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27.1 INTRODUCTION 


Synonymy, hyponymy, meronymy, antonymy (or opposition), and contrast (includ- 
ing co-hyponymy) are the best-known and most-discussed of the paradigmatic, 
lexical-semantic relations, or sense relations. Table 27.1 gives working definitions of the 
relations (which are problematized later), shows reciprocal relations within the termi- 
nology, and offers shorthand for representing the relations. 

These are lexical-semantic relations in that they relate word meanings; that is, for any 
word, each of its senses stands in a unique set of relations to particular senses of other 
words. They are paradigmatic relations in that the set of related meanings forms a 
paradigm of potentially substitutable words. This notion of substitutability means that 
paradigmatic sense relations are usually considered only within a single grammatical cat- 
egory. Thus one can replace the highlighted words in (1) with synonyms in order to create 
a paraphrase, as in (2). To be more specific, one can replace the words with hyponyms (3) 
or meronyms (4), and to refer to an opposite situation, one can use an antonym (5). 


(1) The essay was excellent. 

(2) The paper was superb. [synonyms, paraphrase of (1)] 

(3) The dissertation was excellent. [hyponym, more specific than (1)] 
(4) Theconclusion was excellent, [meronym, more specific than (1)] 


(5) The essay was terrible. [antonym, contrary to (1)] 
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Table 27.1 Summary of sense relation terminology 


X isa synonym of Y =X means the same as Y = Y is asynonym of X 
X=Y Y=X 

Xis a hyponym of Y = X denotes a type of Y = Y isa hyperonym of X 
X<Y Y>X 

Xis a meronym of Y = X denotes a part of Y= Y isa holonym of X 
X<Y YX 

Xis an antonym of Y = X means the opposite of Y = Y isan antonym ofX 
XY Y/x 


These relations are the building-blocks of definition in standard (semasiological) dic- 
tionaries and the subject of relational (onomasiological) lexicographical works. In keep- 
ing with the marketing of alphabetically organized synonym-finders as ‘thesauruses, 
thesaurus here refers to any lexicographical work whose subject is relational informa- 
tion and not definition. 

This chapter first introduces general issues for sense relations, then divides the vari- 
ous relations into two supercategories: asymmetrical and symmetrical relations. Within 
these supercategories, major categories are discussed in detail, including: 


(a) definition of the category and subcategories, applicability to different grammati- 
cal categories, and conflicts between logical and everyday uses of the terms; 

(b) relevance of the category to lexicography, including use in sense-definition and 
the relational information given in thesauruses. 


Tools for the automatic identification of these relations are briefly discussed at the end of 
the chapter. 

Sense relations are generally treated as within-language relations; translational 
equivalents are not usually called synonyms, for example. I thus restrict attention here to 
monolingual lexicography. The relations that have merited -onym names are particularly 
interesting for lexicography because of their (alleged) logical properties and their gener- 
alizability across a large range of vocabulary. Other relations, like those between an activ- 
ity (ballet) and associated things (barre, tutu) or people (ballerina), are not treated here. 


27.2 SENSE RELATIONS, GENERALLY 


27.2.1 Sense Relations or Word Relations? 


Colloquially, we speak of the synonym or antonym (etc.) for ‘a word, but it is usually 
more accurate to speak of sense relations, since different senses of a single word enter 
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into relations with different sets of other words. For example, the senses of house used 
in astrology and music are not hyponyms of building. But the notion of SENSE RELA- 
TION is problematic because the number of a word’s senses is arguably indeterminate. 
For instance, if have has a specific sense ‘eat’ (I had a burrito), then it is a sense-synonym 
for eat, ingest, etc. But if have in this context involves a more general sense ‘perform 
the appropriate action upon (in which case the same sense is used in have a shower or 
have a look), then eat is a hyponym of that sense, rather than a synonym. For lexicogra- 
phy, sense relations can help determine where sense boundaries should be made firm. 
For instance, since hot has different synonyms (warm vs. spicy) and antonyms (cold 
vs. mild) in different contexts, we can conclude it should be defined with two discrete 
senses: ‘high in temperature’ and ‘creating a burning sensation when tasted’ Here, some 
relations are more relevant than others. If two uses of a word have different hyperonyms, 
that is, they denote different types of things, then sense differentiation is necessary. But 
if different synonyms can be offered on the basis of nuances of denotation or connota- 
tion, the need for separate senses is less clear. 

This is not to say that different senses of a word cannot share an antonym or hypero- 
nym, however. For instance, where senses are metaphorically related, relations can carry 
over from literal to metaphorical use: branches works as a meronym of both botanical 
trees and family trees. Morphological relatives can also cover multiple senses. Untie is an 
opposite for four definitions of tie in American Heritage Dictionary (AHD): 


(6) tie 
1. To fasten or secure with or as if with a cord, rope, or strap: tied the kite to a post; 
tie up a bundle. 
2. To fasten by drawing together the parts or sides and knotting with strings or 
laces: tied her shoes. 
3. a. To make by fastening ends or parts: tie a knot. 
b. To put aknot or bow in: tie a neck scarf. 


Semanticists increasingly dismiss the possibility of fully elaborated senses in the men- 
tal lexicon—word meaning is always construed and elaborated in context, and thus 
sense relations should be similarly context-dependent. But, again, context-dependence 
should not be overstated. While the opposite of large is not always small, large/small are 
considered to be ‘better’ antonyms than large/little or expansive/small when considered 
without reference to a particular context. The intuition that large/smalil are better anto- 
nyms is supported by their rates of co-occurrence and people's behaviour with them in 
experiments like free word association. This leads to the hypothesis that some relational 
pairs (or sets) are conventionalized and learnt as units. Murphy (2006) proposes that 
conventionalized opposites are represented in the mental lexicon as discontinuous lexi- 
cal items. Conventionalization provides some motivation for the extension of relations 
across sense boundaries, as when the relation between hard and soft is extended to cre- 
ate software as a complement for hardware, even though hardware is physically hard and 
software is not physically soft. In dictionaries and thesauruses, however, the processes 
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that lead to the construal and conventionalization of relations are not very relevant, and 
the user must be given a clear indication of which relations go with which senses. This 
means, for example, that the antonym hard is repeated in two of the twelve sense entries 
for soft in Oxford American Writer's Thesaurus, and the synonym muted is shared across 
three senses. 


27.2.2 Sense Relations in Linguistic Theory 
and Lexicography 


The study of sense relations had its peak in the structural semantics of the mid-twenti- 
eth century, following associationist (Saussure 1959/1915; Bally 1940) and Worffeld (Trier 
1934) movements. Semantic field approaches came late to the English-speaking world, 
where they were developed by Lyons (1963) and Lehrer (1974). They present the seman- 
tic lexicon as internally structured by synonymy, contrast, and hyponymy. Figure 27.1 
represents the semantic field of American English cooking verbs. 

Here contrast relations are read along the horizontal axis and hyponym relations on 
the vertical axis. Thus, poach and steam contrast and are hyponyms of simmer. French-fry 
and deep-fry share a semantic space, showing their synonymy. Shaded areas indicate 
overlap, or context-dependent synonymy. For example, roast and bake can refer to the 
same process (e.g. oven-cooking a ham), but in most situations they are not semanti- 
cally equivalent (bake bread # roast bread). These approaches operate on a ‘thesaurus 
metaphor’: senses are not defined, but emerge from a pattern of associations among 
words/senses. 

The thesaurus metaphor can be opposed to the dictionary metaphor, in which senses 
are built from semantic components or primitives (e.g. Katz and Fodor 1963; see Murphy 
2010 for a review of historical and current approaches). These approaches derive sense 
relations from words’ meaning components: synonyms share key components, anto- 
nyms differ in one key component, hyponyms have all the same components as their 
hyperonyms plus at least one more. Most modern models of lexical meaning, like actual 
dictionaries and thesauruses, fall between these extremes, acknowledging the role that 
relations play in determining the development and construal of senses. 


steam [et] Yost | [ote 


simmer sauté deep. ae grill 
boil, French~-fry 
eee ed 


FIGURE 27.1 Semantic field of American English cooking verbs (adapted from Lehrer 
1974: 31) 
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The lexical-semantic boom in the 1960s and 1970s should have encouraged more ono- 
masiological dictionaries, writes Béjoint, but ‘this theoretical support was not enough, 
and onomasiological dictionaries have remained a minor genre, generally neglected 
by dictionary users and metalexicographers’ (2010: 21). McArthur (1998a) describes 
the Longman Lexicon of Contemporary English as an exception—a lexicographical 
response to structural semantics. But in keeping with Béjoint’s observation, it never had 
the success of its alphabetical, semasiological counterpart, the Longman Dictionary of 
Contemporary English. 

Use of corpus methodologies in linguistics (following lexicography) has led to more 
interest in the roles of syntagmatic relations (collocates, colligations) in determining 
meaning. Nevertheless, attention to paradigmatic relations is emerging in cognitive lin- 
guistics, particularly for synonymy (Geeraerts 2010b) and antonymy (Jones et al. 2012). 


27.3 TYPES OF RELATIONS: ASYMMETRICAL 
RELATIONS 


Hyponymy and meronymy are asymmetrical, or hierarchical, relations; that is, if A is a 
type (part) of B, B cannot be a type (part) of A. The lack of symmetry makes these rela- 
tions important for definition, since they can describe the definiendum without intro- 
ducing circularity. 

Hyponymy and meronymy directly reflect non-linguistic relations between deno- 
tata. Coffee is a hyponym of drink because coffee is a subcategory of drink, and java, as 
coffee’s sense-synonym, is equally a hyponym of drink, Definition by hyponymy is only 
preferred in the former case, however. Informal, technical, or dialectal headwords are 
typically defined by synonyms and definition by asymmetrical relations is reserved for 
neutral terms, as shown in Figure 27.2. 


27.3.1 Hyponyms and Hyperonyms 


273.11 Defining the Relations 


Hyponyms, or subordinate terms, relate to hyperonyms (superordinate terms) in a 
taxonomy; frout is a hyponym of fish and fish isa hyperonym of trout. (While hypernym 
is often used, hyperonym preserves the -onym ‘name’ etymon and contrasts more clearly 
with hyponym in speech.) Hyponymy is often defined as unidirectional entailment: a 
dog is necessarily a mammal, but a mammal is not necessarily a dog. From a definitional 
perspective, a hyponym’s sense includes its hyperonym’s sense. The meaning of dog thus 
includes all of the properties defined for the meaning of mammal. However, strict defi- 
nition of hyponymy as inclusion runs against our sense of hyponymy as a paradigmatic 
KIND-OF relation. For example, all professors are mammals, but it’s odd to say professors 
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coffee | kafé; 'kafé| 
noun 


1a drink made from the roasted and ground beanlike seeds of a tropica! shrub, served hot or iced : a 
cup of coffee | [as adj.] a coffee pot. 


java | java; ‘java! 
noun informal 
coffee 


Figure 27.2 Hyperonym definition of coffee versus synonym definition of java (Apple 
Dictionary 2.2.1) 


are akind of mammal. And the meaning of teach is contained in the definition of teacher, 
but teacher and teach fail the substitution test for paradigmatic relations. So, our initial 
diagnostic tool for determining if X isa hyponym of Y is to ask: ‘Is X akind of Y?’ 

Competing taxonomies of hyponymy relations exist (see Murphy 2003 for a sur- 
vey). The most common distinction is between the taxonomic Is-A-KIND-OF relation, 
which Cruse (1986) calls taxonymy, and the functional 1s-UsED-AS-A-KIND-OF rela- 
tion. For example, wheat is in a taxonomic relation to grass, but in a functional relation 
to crop. Functional relations are more tenuous because they are not logically neces- 
sary: wheat need not be grown as a crop. Taxonomic relations, in contrast, are logically 
necessary: non-grasses cannot be wheat. Still, taxonomies can compete: the taxonomy 
that classes pine as a type of evergreen contrasts with the one that classes it as a type 
of tree, since evergreens are neither a tree-type (cf. evergreen shrubs), nor are trees an 
evergreen-type (cf. deciduous trees), 

Using the KIND-oF diagnostic, hyponymy is most clearly exemplified as a relation 
among nouns: organism > plant > tree > pine > Scots pine. Such hierarchies are generally 
organized around a basic level (Rosch 1978) that is most perceptually salient. Basic-level 
terms are typically the simplest and most frequent expressions in the hierarchy (tree vs. 
organism or Scots pine). " 

In order to apply hyponymy to other grammatical categories, the KIND-OF diagnostic 
needs adjustment. Lyons (1977) suggests using in a certain way, as in (7)-(8). For verbs, 
manner hyponymsas in (7) are sometimes called troponyms (Fellbaum 1998b). 


(7) To strut is to walk, in a certain way. 
(8) To be generous is to be kind, in a certain way. 


The certain way test is less clear for scalar adjectives, where to a certain degree might be 
preferred. 


(9) To beajar is to be open to a certain degree/?in a certain way. 
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While ajar is a KIND-OF open, in that being ajar entails being open but not vice versa, 
such adjectives are often overlooked in discussions of hyponymy; in thesauruses, they 
are often presented as synonyms, rather than in the hierarchical relation indicated by 
the test in (9). As one goes up the taxonomical tree for adjectives, adjectival hypero- 
nyms are harder to find. For example, no adjective adequately covers the co-hyponyms 
good, bad, and mediocre, leaving the noun merit as the superordinate term. Lyons (1977) 
termed these cross-categorial relations quasi-hyponymy. 


27.3.1.2 In Dictionaries 


Van Sterkenburg (2003c: 130) describes hyponymy as ‘the most important semantic 
relationship’ and ‘basis of our knowledge management’ It is central to the Aristotelian 
standard of definition: a word is defined using a hyperonym (the genus expression, 
underscored in (10) and (11)) and differentiae (in brackets) to distinguish the sense from 
its co-hyponyms. This is easiest for nouns, as in (10), and some verbs, (11): 


(10) kremlin a citadel [within a Russian town] (Oxford Dictionary of English [ODE]) 
(11) dot mark [with a small spot or spots] (ODE) 


Adjectives tend to get their value through contrast rather than hierarchy and therefore 
hyperonym definitions are rarer, but possible: 


(12) generous liberal [in giving or sharing] (AHD) 


27.3.1.3 In Thesauruses 


Hyponyms and hyperonyms are mainstays of thesaurus entries, although they tend not 
to be marked as such. For example, this entry for book has synonyms, hyponyms, and 
[hyperonyms] intermingled: 


(13) book n. volume, tome, tract, text, novel, manual, [work], [publication] (Penguin 
Dictionary of English Synonyms and Antonyms) 


Occasionally, thesauruses mark hyponym relations explicitly. Chambers Thesaurus 
(2004: v) uses panels of text to present ‘word families, which they call hyponyms, 
although some of these panels concern ‘parts of’ or ‘terms related to. At tree we see 
the contrast between taxonyms, presented with include, ‘Trees include: acacia, acer, ..? 
and non-taxonymic hyponyms, presented with types of: ‘Types of tree: bonsai, coni- 
fer,..?. The non-taxonyms are not necessarily incompatible with one another. 
Semantic-field-appropriate hyponym descriptors are sometimes used: fields of biology, 
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items of cutlery, breeds of horses. Most of these panels occur at noun entries, but there 
are occasional panels at verbs (“Ways of walking include’). 

But the explicit treatment of hyponymy in special panels has not kept Chambers from 
presenting hyponyms in the synonym entries, like other thesauruses. For instance, the 
entry for emotion starts with synonyms (‘feeling, passion, sensation, ...’) and shifts to 
hyponyms within the same comma-delimited paragraph (*... happiness, ecstasy, sad- 
ness, sorrow, grief ...”). 


27.3.2 Meronyms and Holonyms 


The part~whole or HAs-A relation is meronymy: handle is a meronym of mug and mug 
is the holonym of handle. Various types of Has-a relation can be identified, such as 
WHOLE > FUNCTIONAL COMPONENT (mug > handle), WHOLE > SEGMENT (week > day), 
COLLECTION > MEMBER (pod > whale), and WHOLE > SUBSTANCE (sea > water) (Chaffin 
1992). Cruse (1986) offers the meronym diagnostic X and other parts of Y. This works 
best for wholes that have different types of parts; so mug > handle is a case of meronymy, 
while pod > whale, week > day, and sea > water display quasi-meronymy. 

In semantics, meronymy is generally not as central as other -onym relations, since 
it is not a logical relation. Many parts are optional (a doorless building is still a build- 
ing) and the same part-names often apply to different types of wholes—a door is 
generally a part, but its whole varies: a building, a refrigerator, an advent calendar. 
Nevertheless, meronymy is one of the most important relations in definition. Part is 
the second-most-common noun in dictionary definitions of English nouns, after act 
(Smith 1985): ‘it’s difficult to define the part without mentioning the whole. On the 
other hand, the part is only occasionally referred to in the definition of the whole’ 
(Atkins and Rundell 2008: 136). This is less true of quasi-meronyms: whale is not 
defined using pod, but pod is often defined as ‘a group of whales. 

Like hyponymy, meronymy is most closely associated with nouns. The relation is not 
relevant to underived adjectives whose meanings are essentially too simple for division 
into parts, but it is relevant to verbs as proper temporal inclusion (or non-troponymic 
inclusion; Fellbaum 1998b). For example, the parts of eating include chewing and swal- 
lowing and these are used in defining eat: ‘put (food) into the mouth and chew and swal- 
lowit’ (ODE). 

Meronymy is less commonly part of traditional thesaurus entries, but is increas- 
ingly found in larger resources. Some lists of parts are included in Chambers 
Thesaurus’s ‘word families (see Section 27.3.1.3). Thinkmap Visual Thesaurus pre- 
sents the PART-OF relations available in Princeton WordNet (see Section 27.5.1), but 
these are fairly incomplete and irregular. Paw, for example is linked by PART-oF 
relation to canid and felid, but not to other types of animals with paws (e.g. 
rodents). 
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27.4 ‘SYMMETRICAL’ RELATIONS: SYNONYMY 
AND OPPOSITION 


Synonymy and relations of oppositeness (antonymy) or contrast (co-hyponymy) are 
said to be logically symmetrical, in that if X is a synonym/antonym/co-hyponym of Y, 
then Y is just as much a synonym/antonym/co-hyponym of X. Natural language, how- 
ever, is rarely so logical. Thesauruses commonly give fresh as the antonym for stale, for 
example. But the antonym of fresh is stale only when applied to bread, cake, and jokes, 
and not when applied to meat, vegetables, paper, or laundry. While we could take this 
as evidence that fresh has as many different senses as it has opposites, this would multi- 
ply its dictionary senses to an unreasonable number, as we would need senses that are 
defined essentially as ‘not stale, ‘not rancid; ‘not wilted; ‘not mouldy, ‘not frozen, ‘not 
dried, etc. Instead, we need to represent a relation of imperfect symmetry: fresh has a 
more general sense and a wider range of application than its opposites have, and thus it 
maps onto more antonyms and synonyms than its antonyms do. 

The relative symmetry of these relations nevertheless means that they are less useful 
than hyponymy and meronymy in writing definitions (though they certainly are used) 
and much more the stuff of thesauruses. The symmetrical relations are also less particu- 
larly associated with nouns. They are found in all content-word categories and arguably 
even among function words—for instance, we might consider a a synonym of one and 
the opposite of the. 


27.4.1 Synonyms and Near-synonyms 


27.4.1.1 Defining the Relation 


Synonymy is either the rarest or most abundant relation, depending on how one defines 
it. The most restrictive definition, S1, treats synonymy as a relation among words, rather 
than senses: 


S1: sameness of meaning and use across all senses (absolute synonymy) 
anyone=anybody: have the same two senses: (1) any person and (2) someone of 
importance, as in anybody, who's anyone,. Their equivalence is clear when they are 
swapped: anyone, who’ anybody,. 


Limiting ourselves to synonymy as a sense (rather than word) relation, definitions $2-S4 
are each sometimes understood as the meaning of synonymy. 


$2: sameness of meaning and use; complete substitutability in every possible context for 
that sense (perfect synonymy) 
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crucifer=brassica: denote members of the cabbage family and are substitutable in the 
‘cabbage’ sense. They are prevented from absolute synonymy by crucifer’s sense ‘a 
person carrying a cross in a procession, which brassica does not share. The two have 
a similar register, and thus are close in non-denotational meaning as well. 


S3: sameness of denotative meaning, but variation on connotative, affective, or social 
dimensions (cognitive synonymy) 


peritonsillar abscess=quinsy: denote the same illness, but differ in register and argua- 
bly dialect, with quinsy being more colloquial and old-fashioned and more common 
in British English than American. 


S4: similarity of meaning; context-specific substitutability (mear-synonymy) 
to pacify~to placate: may be applied to the same situation, as in She pacified/placated 
the crowd by singing, but the senses are different enough that they are not always sub- 


stitutable. For instance, one could pacify a barking dog by knocking it unconscious, 
but this would not placate it. 


Each of these definitions is more forgiving than the last, and in the cases of S1 and Sz2 it 
is unclear that true content-word examples can be found. The example of crucifer/bras- 
sica is arguably flawed by the fact that crucifer and its derivatives are more common in 
American English than in British, and so it might carry different social meaning about 
the speaker or the context than brassica would. 

‘The difficulty in finding absolute or perfect synonyms is a product of the commu- 
nicative purpose of language. If we must choose between word A and word B, we seek 
reasons for doing so. So even if A and B mean the same thing now, it is only a matter of 
time before they develop differences in sense, connotation, or social meaning, Harris 
(1973: 12-13) goes so far as to say ‘If we believe there are instances where two expressions 
cannot be differentiated in respect of meaning, we must be deceiving ourselves. 

Near-synonymy is the staple of thesauruses and what people usually mean when they 
say synonym. Apresjan (1973: 175) thus defines synonyms as ‘words which designate the 
same thing but emphasize different aspects of it, or as words which have the same mean- 
ing, but differ in its finer shades: The result is that what counts as synonymy in everyday 
life is a far stretch from the philosopher's ideal of synonymy as a symmetrical relation of 
identity. Instead, synonym in the popular (and lexicographical) treatment is any relation 
that involves similarity rather than contrast. For instance, in the context of talking about 
schools, it is acceptable to treat children and pupils as synonyms, but in other contexts 
pupils can only be understood as a hyponym of child, not another word for child. The 
context (and the extension of the term within the context) is key. 


27.4.1.2 In Definitions 


Definition by synonym plays a ‘useful complementary role’ (Atkins and Rundell 
2008: 421) in definitions, but rarely takes the place of a more elaborated definition, since 
(a) no two words are exactly alike and (b) mutual definition by synonymy would lead 
to uninformative circularity. Zgusta (1971: 262) recommends synonym definitions for 
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‘obsolete, dialectal, colloquial, vulgar etc. lexical units ..., provided there is a connota- 
tively neutral synonym: In such cases usage labels can relate the synonyms’ differences. 
Many dictionaries, for instance, define gay as “homosexual’—though we may question 
which (if either) is now the connotatively neutral synonym. The direction of definition 
in this case is useful, however, since to define homosexual as ‘gay’ would leave ambiguity 
as to which sense of gay was intended. 


27.4.1.3 In Thesauruses 


The proliferation of near-synonymy creates demand for two types of informa- 
tion: synonym identification and synonym differentiation. Differentiation was the 
subject of the earliest English dictionary of synonyms (see Béjoint 2010). These days, 
differentiation is sometimes provided in special! panels in dictionaries (e.g. AHD) or 
thesauruses (e.g. Chambers Thesaurus), but thesauruses that list synonyms are more 
common. 

Thesauruses tend to present one synonym as the most neutral or ‘normal’ member 
of the category. This key synonym is aptly described by a number of synonyms: central 
synonym, synonym of preference, prototypical synonym (van Sterkenburg 2003c), or 
core synonym. So, for example, thesauruses generally present die as the primary term, 
linking pass away, perish, kick the bucket, and so forth to die rather than leading with 
pass away. Die serves here as the most generally applicable term as well as the least 
connotatively or socially informative one. Organization around a privileged synonym 
can be clearer in electronic databases, such as WordNet (see Section 27.5.1), where 
words are organized into synonym sets that centre on a primary node. This notion of 
centrality is less obvious in alphabetical resources, but hinted at by the lack of reci- 
procity in some listings. For instance, while perish, pass away, and kick the bucket may 
be offered as synonyms at die, no ‘dying’ sense is offered at kick and no separate entry 
is given for pass away or kick the bucket. This preferential treatment of register-neutral 
terms can be a problem, since, as found in Murphy (2013), thesaurus-using writers are 
often looking for more formal words to replace the informal or crude words that they 
had in mind. 

Moreover, the die entry can be expected to be much longer than those of more periph- 
eral synonyms like perish and decease. The relation between a central synonym and its 
near-synonyms is often such that the near-synonyms are more informative (ie. arguably 
hyponyms) or differently informative (instantiations of a prototype [see Taylor 2003], or 
overlapping co-hyponyms). 

Sense differentiation in alphabetical thesauruses ranges from format-only marking 
(e.g. paragraph breaks) to elaboration by definition or example sentence. More recent 
thesauruses (e.g. Chambers Thesaurus) mark other types of differentiation, such as dif- 
ferences in register. Failure to appreciate these differences and boundaries results in the 
stereotypically bad thesaurus-speak in which people ingest their repast instead of eat- 
ing their dinner. The amount of semantic slippage in thesaurus entries is such that Ron 
Hardin (reported in Church et al. 1994) found that synonym paths between words and 
their antonyms are typically six steps or fewer in The New Collins Thesaurus. For example, 
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authentic gives the synonym believable, believable gives probable, and so forth all the way 
to unauthentic: 


(14) authentic > believable > probable > ostensible > pretended > spurious > 
unauthentic 


27.4.2 Relations of Sense Contrast 


People are generally very sensitive to the distinction between contrastive and 
non-contrastive relations (Chaffin and Herrmann 1984), and this is reflected in thesau- 
ruses, where antonymy is the only relation that is reliably treated separately from the 
lists of ‘synonym relations. Contrast relations also differ from similarity relations (syn- 
onymy, hyponymy) in that special status is given to binary relational sets: that is to say, 
opposition. 


27.4.2.1 Co-hyponyms, Contrast Sets 


Co-hyponyms contrast within the same taxonomic level. Co-hyponymy within pairs of 
items is generally treated as oppositeness (antonymy). Larger sets (e.g. fork/knife/spoon) 
lack the special status of antonyms, but nevertheless play important roles in models of 
the mental lexicon and in dictionaries. 

The term contrast set is arguably more accurate than co-hyponym for two reasons. 
First, such sets do not entail lexical hyponymy. That is, we can speak of cigarette/ 
cigar/pipe/e-cigarette/nicotine gum as a contrast set of ‘nicotine delivery systems’ 
without needing a lexicalized term for the super-category. Second, co-hyponymy 
can connote a relation among nouns—that is, names for types of things. There 
are plenty of verb contrast sets, such as sit/stand/lie (the basic motion verbs that 
encode postures) and adjective contrast sets, such as (US) good/fair/serious/criti- 
cal in reference to medical conditions. Finally, once we allow co-hyponym sets 
without a lexical hyperonym, co-hyponymy is not clearly distinguishable from 
co-meronymy: stem/leaf/root, for example, are co-meronyms of plant or part of the 
contrast set of types of plant-parts. 

The boundary between synonymy and co-hyponymy is often less clearly made than 
that between synonymy and antonymy. Both the semantic category and taxonomic level 
can make a difference to whether we perceive co-hyponyms as contrasting or similar. 
At the sit/stand/lie level, the differences in the motion verbs are stark, and it would be 
impossible for anyone to perform the actions of sitting and standing simultaneously. But 
the types of sitting-down can overlap: one could flop, plop, or plunk oneself down. While 
we might be able to describe differences between prototypical ‘flopping’ and prototypi- 
cal ‘plopping’ many sitting actions might be described as either. This leads to their treat- 
ment as synonyms in thesauruses. 
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27.4.2.2 Contrast Sets in Definitions and Thesauruses 


Co-hyponyms are not essential features of definition and so they rarely occur in noun 
and verb definitions, although ‘[i}n many older dictionaries definitions of adjectives 
were laden with cohyponyms, but current dictionaries make serious efforts to avoid this 
(Atkins and Rundell 2008: 134). 

Contrast sets are more likely to be presented in thesauruses—but at lower rates 
than antonyms. The Chambers Thesaurus treatment of hyponym relations discussed 
in Section 27.3.1 is an example of explicit listing of co-hyponyms (at the hyperonym’s 
entry)—but these ‘type of’ lists do not always represent true contrast, as in (15): 


(15) types of precipitation include: dew, downpour, drizzle, fog, hail, mist, rain, rainfall, 
rainstorm, snow, snowfall, snowflake 


These ‘types of precipitation’ do not constitute a co-hyponym (or contrast) set, as some 
members do not contrast as types of precipitation (snow-snowfall-snowflake) and 
potential contrast sets within the list (e.g. snowflake/raindrop/hailstone) are left incom- 
plete. The list presentation is incompatible with the layers of the taxonomy and the com- 
peting taxonomies here. A more sophisticated graphic presentation could make these 
relations clearer. 


27.4.2.3 Antonymy 


Words in binary contrast are familiarly called opposites, and sometimes called antonyms, 
although this term is restricted in different ways by different authorities. Here antonym 
is used as a technical term for any lexical opposition. Antonymous senses are minimally 
different; they share all relevant properties except for one that causes them to be incom- 
patible. For example, up and down share that they name directions ona vertical axis, but 
they denote irreconcilable directions. 

Antonymy is stereotypically associated with adjectives. Adjectives lend themselves 
easily to opposition by minimal difference because many describe simple properties in 
a single semantic dimension: high/low, hard/soft, hot/cold. But the adjective-antonymy 
association may distract us from the realities of antonym relations and use. For exam- 
ple, Lobanova et al. (2010) identified grammatical frames in which adjectival antonyms 
frequently co-occur and used them to automatically search for antonyms in a corpus. In 
spite of starting from adjectival frames, they found more antonymous noun pairs than 
adjectival antonyms. 

The binary character of antonyms can arise ‘naturally’ when there are only two 
co-hyponyms in a set (e.g. night/day), two extremes ona scale (heavy/light), or converse 
perspectives on an activity (buy/sell). Morphological oppositions are also available, for 
example citizen/non-citizen, forgettable/unforgettable, agree/disagree. But binarity in 
opposition also arises in contexts where more than one potential opposite is available, 
indicating that there is something special about twos in contrast. For example, happy/ 
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Table 27.2 Types of opposite relation 


Type 
(alternative names) Description Example 
Complementary Perfectly bisects a domain: married/unmarried 
(contradictory) if X is not-p, then it is g AND if Xis not-g, then itis p 
Contrary (polar; true Two extremes of a scale that has a middie ground: bright/dim 
antonymy) If Xis p, itis not-q; but if Xis not-pitis not 
necessarily q 
Converse Opposed perspectives on the same activity or parent/child 
relationship: give/receive 
IfXis pto Y, then ¥ is q to X 
or 
(for verbs) If X p's ta Y, then Y q's from X 
Reverse X reverses the actions/outcomes of Y break/fix 
fasten/unfasten 
Directiona! X is opposite to Y in space narth/sauth 
X travels the opposite course to Y ta/from 


sad are usually treated as antonyms in relatively neutral contexts, whereas more con- 
textual support is needed for other positive/ negative emotional pairings, such as happy/ 
angry or happy/afraid. 

‘Table 27.2 shows semantic types of lexical opposition. 

‘These are the most-mentioned categories of antonymy, but the list is not exhaustive. 
We could add gender opposites (aunt/uncle), for example, and could further divide the 
DIRECTIONAL category (Lyons 1977; Cruse 1986). The categories are also not mutually 
exclusive; converse and reverse antonyms could arguably be classed as DIRECTIONAL. In 
use, many complementaries are more gradable than the category should logically allow. 
For instance, phrases like neither true nor false and more true than false indicate gradabil- 
ity anda middle ground in a domain that is otherwise treated as if it is absolutely bisected. 

‘The subtypes of antonyms in Table 27.2 are generally not mentioned in dictionary or 
thesaurus entries—the label ‘antonym’ or ‘opposite’ suffices. This is neither surprising 
nor concerning. Language users are generally not very sensitive to the divisions among 
logical antonym types (Chaffin and Herrmann 1984). Investigating the discourse func- 
tions of antonyms in text, Jones (2002) found that that the traditional, logical categories 
held little import in what people did with antonyms. Murphy (2003) argues that these 
distinctions are generally predictable from the meaning of the terms involved—that is, 
gradable meanings have contrary antonyms, relational terms have converse opposites, 
and so forth. 

Antonymy is sometimes argued to be a syntagmatic as well as a paradigmatic relation 
(e.g. Jones 2002; Murphy 2006), since members of antonym pairs tend to co-occur in 
text at high rates in a variety of grammatical frames (Jones 2002; Davies 2012). Because 
antonym paradigms have only two members and we experience them together, their 
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pairings have the opportunity to become conventionalized to a greater extent than other 
relational paradigms. Such conventionalized pairs are often called ‘canonical antonyms 
(Murphy 2003) or ‘direct antonyms’ (Gross et al. 1989), and they are not always predict- 
able on semantic grounds. For instance, it is likely that we prefer the pairing theory/ 
practice over theory/application or theory/execution simply because we hear theory and 
practice together more often and therefore they seem an idiomatic pair. Because this 
kind of information is learnt, it bears representation in dictionaries, particularly those 
for learners, who have had less exposure to the canonical pairings. 


27.4.2.4 Antonymy in Definitions and Thesauruses 


Antonyms are sometimes used in definitions, particularly for adjectives, as when absent 
is defined as ‘not present. The usefulness of such definitions is limited, however, by 
(a) the aim to define words in simpler terms than the headword (defining a medical 
sense of open as ‘not laparoscopic’ expects that the user has some medical expertise), 
and (b) the circularity that results if they are applied to both members of the pair (e.g. 
present = ‘not absent’; absent = ‘not present’). 

Dictionaries occasionally present antonyms as run-on information at sense entries, 
but there is little evidence that this is approached systematically. Examining Collins 
Cobuild Advanced Learner’s English Dictionary, Paradis and Willners (2007) found that 
one third of its antonym pairs are presented unidirectionally; the entry for odd gives 
even, but even’s entry presents no antonym, for example. They conclude that in spite of 
its founding principles, ‘the CoBUILD project does not seem to have gone the whole hog 
and carried out a proper corpus-driven analysis of antonyms’ (2007: 274). 

Specific dictionaries of antonyms are occasionally produced (Dictionary of Contrasting 
Pairs, Room 1988), but as van Sterkenburg (2003c: 143) observes, ‘24-carat antonym 
dictionaries are few and far between. Instead, antonyms tend to be solid pillars in syn- 
onym dictionaries. Indeed, many thesaurus-type resources are entitled (something 
like) Dictionary of Synonyms and Antonyms (including ones published by Chambers, 
Collins, Merriam-Webster, Oxford, Penguin). Although such titles advertise synonym 
and antonym content equally, the antonyms are invariably last and fewer than the syn- 
onyms. The Oxford Dictionary of Synonyms and Antonyms, for example, instructs the 
reader to find additional antonyms by looking up the synonyms of the antonyms offered 
(p. viii). Otherwise, its introductory material makes no mention of antonyms. In such 
dictionaries and thesauruses, antonyms seem to be over-represented for adjectives and 
under-represented for nouns. For example, theory/practice is one of the most commonly 
co-occurring antonym pairs in Jones and Murphy's (2011) corpus-driven antonym-find- 
ing study, yet Collins English Thesaurus, Merriam-Webster (m-w.com), Oxford Dictionary 
of Synonyms and Antonyms, and Websters New World Thesaurus give no antonyms for 
practice, although they all give theoretical at the adjective practical! 


' Chambers Thesaurus alone offers theory at its practice entry. Collins gives practice at theory. 
Merriam-Webster offers other near-antonyms (fact, knowledge) at theory. 
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Merriam-Webster singles itself out by distinguishing between an antonym ‘a mean- 
ing that completely cancels out another words [sic] meaning’ and near-antonyms 
which are not necessarily incompatible, but are ‘markedly contrasting’ (<http://www. 
merriam-webster.com/help/thesnotes/antonyms.htm>). Unsurprisingly, the distinc- 
tion is not always clear. For instance, basic and original are presented as antonyms of sec- 
ondary, while fundamental and primary are given as near-antonyms. This is an artefact 
of the wording of secondary’s definition ‘taken or created from something original or 
basic, rather than a difference in logical relations. 


27.5 SENSE RELATIONS, LEXICOGRAPHY, 
AND TECHNOLOGY 


27.5.1 Electronic Thesauruses 


Technological advances have changed how we can (and must) represent and think 
about sense relations in a number of ways. First, computer interfaces make resources 
more searchable, liberating users from the ‘alphabetical straitjacket’ (de Schryver 
2003: 157) and encouraging more onomasiological structures and processes. Second, 
large-scale corpora—and the means to interrogate them—have changed the evidence 
base for dictionary compilation, sense differentiation, and definition. 

Electronic thesauruses have been developed since the 1950s, with a particular focus 
on document retrieval with respect to technical vocabularies. This is not the place to 
review such projects (see Aitchison and Clarke 2004), but it is worth noting that the 
representation of relations is motivated by the need for terminological consistency and 
accessibility. Related and relevant is the artificial intelligence goal of a semantic web in 
which the system implements ontologies—that is, systematic representation of concepts 
and relations among them—to understand users’ natural-language queries. Unlike 
traditional print thesauruses, many natural-language processing (NLP) projects have 
the need and the resources to explicitly mark different relational types (e.g. 1s-A, Is-A- 
TYPE-OF, IS-A-PART-OF). 

Particularly important among electronic resources is WordNet (see Fellbaum 1998b), 
the electronic lexical database of English founded at Princeton University in the 1980s. 
As mentioned in Section 27.4.1, WordNet organizes words into sets of synonyms (syn- 
sets), which are linked to one another by other lexical relations. Different relations 
prevail in the separate sub-lexicons for nouns (hyponymy), verbs (troponymy), and 
adjectives (antonymy). The English WordNet project continues to develop and is repli- 
cated in dozens of languages (Fellbaum 2006). 

The WordNet architects originally intended to encode ‘the vast range of evidence 
for the synchronic organization of the lexicon that psycholinguists have gathered’ 
(Beckwith et al. 1991: 212) from lexical speech errors, aphasia, word-association data, 
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and other experimental investigations. But at the level of recording particular relations 
between particular words, WordNet has mainly relied on published lexicographical 
resources and native-speaker intuitions; that is, WordNet is not much different from 
traditional lexicographical products, except for its elaborate relational structure and its 
easy-to-search format. 

Because of its accessibility, WordNet has become the standard source for relational 
information in NLP projects and more and more semantic-web projects (W3C 2006), 
for better or worse. Kennedy and Szpakowicz (2008) found that the words presented 
as synonyms in Roget’s Thesaurus (1987 [1852]) performed no worse and sometimes 
better than those in WordNet on various measures of semantic similarity. Others 
(e.g. Sampson 2000) have criticized WordNet for relating senses in ways that are not 
semantically justifted—for example, treating weighty and weightless as direct antonyms 
because of their morphological similarity. 

Béjoint (2010: 21) has written that electronic lexicography ‘may’ reinvigorate ono- 
masiological dictionaries, quoting McArthur’s (1998a: 150) observation that thematic 
lexicography is ‘a covert influence on much of current reference-book practice, and is 
enjoying a modest overt renaissance among “print” dictionaries, and could also have 
a useful future as lexicography moves into the electronic era. The twelve years between 
McArthur’s could and Béjoint’s may indicate that interest in producing onomasiological 
dictionaries is no more certain, But non-print technology helps present relational infor- 
mation from existing resources in more accessible and intuitively understandable ways. 
In the web-based Thinkmap Visual Thesaurus (1998-), the user can watch an animation 
of WordNet synsets branching out from the query word, resulting in representations 
like Figure 27.3. 

Each synonym set is clustered around a sense node (represented as a dot), which the 
user can click for a definition, While the interface allows the options of specifying par- 
ticular relations among the sixteen it claims to represent, its usability does not extend 
much past synonym relationships. For example, while the user is allowed to restrict a 
search to only TyPE- OF relations, PART-OF relations are returned with them. Ifa direct- 
antonym relationship is not represented in WordNet-—-say one wants an opposite for 
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FIGURE. 27.3 Endpoint of relational animation for rabbit, Thinkmap Visual Thesaurus 
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recent—then the user can only click through the synonyms or hyperonyms offered in 
hope of finding some with applicable antonyms. 

As it stands, the electronic resources that are readily available to regular thesaurus 
users (particularly those bundled with word-processors or operating systems) are faster 
to search and therefore may be used more. But they do not yet offer richer relational 
information than print sources have afforded. 


27.5.2 Researching Relations 


NLP demands for relational information have encouraged attempts to automate rela- 
tion detection. Such projects are considered relatively successful if they achieve rates of 
70-80 per cent in correctly identifying related words (Pantel and Pennacchiotti 2006). 

Relation-identification attempts use two types of method: pattern-based or cluster 
searching. Most have been of the pattern-based type, finding lexico-grammatical pat- 
terns that often involve words that are related in a certain way, then seeing which words 
occur in those patterns within a large corpus. For instance, the pattern X and other Ys 
can be searched (with wildcard operators in the X and Y positions) to find instances of 
X < Y hyponymy. Cluster methods, in contrast, identify semantic properties of words 
in text based on their syntactic dependencies and hypothesize relations between ones 
whose semantic profiles fit the relation. This has mostly been used for synonymy and 
requires very large corpora (Pantel and Pennacchiotti 2006). 

Synonymy is the most common target of automation attempts, since there are many 
practical applications for synonym-detection, such as information retrieval: if 1 search 
the web for ‘garden clippers, it would be helpful if the search engine recognized this 
as a search for secateurs. Heylen et al. (2008) review some of the available methodolo- 
gies. Hyponymy (e.g. Hearst 1992; Tjong Kim Sung and Hoffman 2007) and meronymy 
(e.g. Berland and Charniak 1999) have been the subject of various pattern-based discov- 
ery methods. Antonymy is a latecomer to this party, and interest has been more lexico- 
graphical than for the other relations, since the NLP applications requiring antonyms 
are fewer (Jones et al. 2007; Lobanova et al. 2010). 

Such approaches still generate up to 30 per cent false hits and thus will not replace the 
human element in lexicography, but they can offer valuable insights for the lexicographer. 
For example, Murphy, Jones, and Koskela (2015) searched for parallel grammatical con- 
texts (‘Adj-fo- Verb-Conj-Adj-to-Verb’ and ‘Adj-Prep-Noun-Conj-Adj-Prep-Noun’) to 
find potential antonym pairs in a corpus. Pairs that co-occurred often in the corpus were 
vetted and ‘canonical’ pairings identified. Only one (day/night) of the nine noun pairs we 
found is included in Oxford American Writer’s Thesaurus (so, no adult/child, content/form, 
nature/nurture, etc.). Four of the fourteen adjective pairs are not in the thesaurus (bottom- 
up/top-down, explicit/implicit, general/particular, gentle/tough), and neither of the verbs 
were (although they might be considered better co-hyponyms than opposites): learn/ 
teach, wash/dry. Other thesauruses do no better on this test. The moral of this story is that 
the time is ripe to pursue corpus-based improvements on onomasiological resources. 
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ANU KOSKELA 


28.1 INTRODUCTION 


WHEN considering the treatment of a lexical form with multiple different senses, a 
lexicographer must decide whether to separate some of the senses into a different dic- 
tionary entry. Splitting senses into different entries with identical headwords creates 
homonyms, whereas lumping them under the same headword means representing 
them as senses of a single polysemous lexical item. The traditional distinction between 
homonymy and polysemy revolves around the notion of relatedness: the senses of a 
polyseme are related, whereas those of homonyms are not; homonyms are distinct 
lexical items that happen to share the same form. That both polysemy and homonymy 
exist within a language's lexicon is a matter of consensus among theoretical linguists 
and there is even some psycholinguistic evidence that polysemes and homonyms may 
be processed and represented differently in the mental lexicon (see e.g. Klepousniotou 
2002; Beretta et al. 2005). However, for the lexicographer, the answer to the question of 
when to create homonymous lexical entries is by no means obvious. This is in part due 
to theoretical complexities in the definition of homonymy, which mean that there are no 
clear-cut criteria that can unequivocally categorize a particular case as either polysemy 
or homonymy. But furthermore, when it comes to identifying homonyms, the interests 
of the theoretical linguist and the lexicographer do not always meet. In practical lexi- 
cography, how homonymy is represented (and even whether it is represented at all) is 
a decision that impacts on the macrostructure of the dictionary, that is, the number of 
headwords the dictionary contains. Such decisions must take into account the function 
of the dictionary and the needs of its target audience. For this reason, different types of 
dictionaries come to different decisions regarding the identification and representation 
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of homonyms. This chapter first considers the complexities of defining homonymy and 
how the polysemy/homonymy distinction relates to lexicographic concerns (Section 
28.2). In Section 28.3 I discuss how homonyms are represented in different types of dic- 
tionaries, based on a review of thirty-three monolingual English-language dictionaries 
that vary with respect to their target audience (adult native speakers, children, or lan- 
guage learners), their size (unabridged to pocket-size), and their range (general-pur- 
pose or special-purpose). Section 28.4 concludes the chapter. 


28.2 DEFINITION OF HOMONYMY 


Homonyms are defined as distinct lexical items that share an identical form but whose 
senses are not related, This basic definition raises two issues: (1) What counts as formal 
identity? and (2) How is the relatedness of the senses determined? In this section, I first 
address the question of relatedness and then turn to the issue of formal identity. 

As noted above, homonymy is usually defined in contrast with polysemy, where a sin- 
gle lexical item has multiple related senses. ‘The notion of relatedness may be defined 
either in diachronic or synchronic terms. Diachronic relatedness is a question of ety- 
mology, while synchronic relatedness requires a discernible semantic relation between 
the senses in present-day language. 


28.2.1 Diachronic Relatedness 


In diachronic terms, polysemous senses emerge through the processes of meaning 
extension that operate in language. Thus, for example, the ‘device for moving the pointer 
on a computer screen’ sense of mouse is a metaphorical extension of the ‘small rodent’ 
sense of the word. In homonymy, on the other hand, the shared form of the distinct 
lexical items is a matter of linguistic accident. It may be the result of phonological or 
morphological changes that cause once distinct lexical forms to become homonymous, 
or else come about through lexical borrowing if a loanword hasa similar form to a word 
already existing in the language. For example, the form calf meaning ‘young bovine ani- 
mal’ can be traced back to Old English cealf, whereas calf’, ‘the back of a person's leg 
below the knee’ is a borrowing of Old Norse kdifi. Through phonological and ortho- 
graphic change, the two lexical items have come to share the same form. As a means 
for distinguishing polysemy and homonymy, the diachronic criterion has the advantage 
that it should in principle be possible to establish objectively whether particular senses 
of a wordform share the same historical origin or not—at least as far as written records 
are available for the language. However, diachronic relatedness may sometimes depend 
on how far back in history you go. Take, for instance, the classic example of homonymy, 
bank. The ‘mound of earth at the edge of a river’ and ‘financial institution’ senses entered 
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the English language at different times and from different sources (the first from Old 
Norse, the second from Italian, via French), but they can ultimately be traced back to the 
same Germanic origin. (See further examples in Durkin, this volume). 

Some theoretical linguists, however, see the origins of words as irrelevant for the 
description of the current lexical structure of a language or the lexical knowledge of a 
native speaker (see e.g. Pustejovsky 1995: 28). Speakers are usually not aware of the histo- 
ries of words and even if they are, such knowledge is metalinguistic knowledge, knowl- 
edge about words, rather lexical knowledge. In lexicography, the diachronic criterion 
is naturally important for historical dictionaries (see Durkin, this volume). But other 
types of dictionaries also often turn to etymology as a basis for homonym identifica- 
tion. As Atkins and Rundell (2008: 192) note, etymology provides a simple-to-apply rule 
when deciding when to split senses into different entries. However, others (e.g. Zgusta 
1971; Robins 1987; Cowie 2001) maintain that a synchronic dictionary should base its 
homonym identification policy not on word histories, but on relations between mean- 
ings in present-day language. 


28.2.2 Synchronic Relatedness 


The alternative, synchronic definition of ‘relatedness’ considers the semantic relations 
between the different senses of the lexical form. The senses of mouse noted above are 
clearly related semantically (the shape of a computer mouse resembles that of a rodent 
mouse), but any semantic connection between the senses of calf would be rather tenu- 
ous. However, what counts as a semantic relationship is notoriously difficult to define 
objectively. Intuitive judgements of semantic relations are necessarily subjective and 
may be influenced by one’s inclination for finding differences or similarities in mean- 
ing (Tuggy 1999). When I used the ‘permanent ink mark on skin’ and ‘military drum 
signal’ senses of tattoo as an example of homonymy in a lecture, some of my students 
(all native speakers of British English) pointed out that they actually saw a connection 
between drumming and the rapid movement of the tattooist’s mechanical needle. This 
example also illustrates how the diachronic and synchronic criteria may disagree with 
each other: the two senses of tattoo are etymologically distinct (the ‘ink mark’ sense 
derives from Polynesian languages, while the ‘drum signal’ sense is from Dutch), but 
may be semantically related, at least for some speakers. Similarly, it is possible to see 
a semantic connection between the senses of scour ‘clean by scrubbing’ and ‘search an 
area insofar as both refer to doing something thoroughly. The two senses are, however, 
etymologically unrelated: the former is ultimately from popular Latin exciirare, ‘polish, 
clean off’ while the latter is of unknown origin. Conversely, the senses of club ‘a heavy 
stick / ‘a group of people with a common interest’ are historically related (the latter 
sense is derived from the former, although obscurely), but the semantic connection 
between the two is far from obvious. Complexities such as these make semantic relat- 
edness a difficult criterion for the lexicographer to apply in identifying homonyms. 
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28.2.3 Formal Identity in Homonymy 


Apart from questions of relatedness, the identification of homonyms is also depend- 
ent on the formal properties of lexical units: pronunciation, spelling, and morphosyn- 
tactic properties. Homonyms may share the same phonological form (homophones) 
or orthographic form (homographs) or both. In lexicography, the treatment of 
non-homographic homophones like weak and week is generally unproblematic, as 
their different orthographic forms necessitate distinct headwords. Homographs that 
are not homophones, such as bass /bets/, /bees/ or wind /wind/, /waind/ are also usu- 
ally split into different headwords, but not all dictionaries follow this principle. For 
instance, Collins Cobuild English Language Dictionary (1987) (= Cobuild1) has a deliber- 
ate policy of avoiding homonymous entries altogether. It therefore groups, for example, 
both pronunciations of bass under one headword. Moon (1987: 88) justifies this policy 
by noting that the orthographic form is what provides users access to the words and 
meanings in a dictionary. The dictionary also ignores capitalization and groups under a 
single headword both senses that are spelled with a lower case initial and those spelled 
with a capital (including may as a modal verb and May the month). A similar policy is 
adopted by some other dictionaries, including some smaller pocket-size dictionaries as 
a space-saving device (see Section 28.3.1.1 below). 

The opinion of lexicographers and theoretical linguists alike is divided over the extent 
to which the polysemy/homonymy distinction is affected by morphosyntactic fac- 
tors. Consider, for instance, cases of conversion, such as the derivation of the noun a 
drink from the verb to drink. If we assume that units of lexical representation should 
have a constant set of formal properties, including part of speech, these senses should 
be separated into distinct lexical items (drink! v. and drink? n.). But treating these senses 
of drink as homonyms fails to represent the clear historical and semantic relationship 
between them—which some consider to be more important than their grammatical dis- 
tinctiveness (e.g. Lehrer 1990). The question is essentially what the primary unit of lexi- 
cal description should be—whether it is defined in grammatical terms or defined as a set 
of any semantically (and/or diachronically) related senses (what Malakhovski 1987 calls 
a ‘hyperlexeme’). 

However, if grammatical factors are deemed to be relevant in defining lexical items, 
what degree of morphosyntactic difference is sufficient for splitting senses into separate 
entries? Even senses that have the same part of speech can differ from each other in 
various ways in terms of their grammatical behaviour (see e.g. Lyons 1977 and Cowie 
2001 for discussion). Senses may differ with respect to their grammatical sub-class—for 
example, the ‘material’ sense of glass is a mass noun while the ‘drinking vessel’ sense is 
a count noun. Senses may also have different inflectional forms: for instance, the past 
tense of hang is hanged in the ‘execute by suspending by the neck’ sense whereas the 
irregular hung is used otherwise. Senses may also differ in terms of their syntagmatic 
properties, such as the types of collocates they allow or, in the case of verbs, the argu- 
ment structures they admit. Compare, for instance, the uses of the verb fly in The plane 
flew to Paris and The airline flies 8 million customers to Paris every year. However, at least 
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in English lexicographic practice, such fine-grained morphosyntactic distinctions are 
generally not treated as criterial in separating senses into homonymous entries. Cowie 
(2001), for instance, notes that he has not found any monolingual English dictionaries 
where the count and mass noun senses of staff (‘long stick / ‘the people employed by a 
company or organization’) would be separated into different entries. But many diction- 
aries do indeed separate homonyms on the basis of part of speech. Apart from the fact 
that this makes for a simple principle that can be applied consistently, creating sepa- 
rate headwords for different parts of speech can also be helpful for the user. Atkins and 
Rundell (2008) argue that users of learners’ dictionaries, for example, often know the 
part of speech of the unknown word, and may therefore find it helpful to have different 
parts of speech separated, rather than ina single entry (see Section 28.3.2 below). 


28.2.4 The Representation of Homonyms in Dictionaries 


As the discussion above has shown, at least the following types of lexical units may 
either treated as homonyms or as senses of a single polysemous lexical item (see Atkins 
and Rundell 2008: 192-3 for discussion of these types): 


* Same written/spoken form, different etymology, distinct meaning 
E.g. bat ‘stick used in some ball games’ / ‘flying mammal’; ca/f‘young bovine ani- 
mal’ / ‘back of a person’s leg below the knee’ 

¢ Same written/spoken form, different etymology, related meaning 
E.g. scour ‘clean by scrubbing’ / ‘search an area; ear ‘hearing organ’ / ‘head 
of corn’ 

« Same written/spoken form, same (if distant) etymology, distinct meaning 
E.g. bank ‘mound of earth at the edge of a river’ / ‘financial institution’; club ‘a 
heavy stick’ / ‘a group of people with acommon interest’ 

« Same written/spoken form, same etymology, related meaning, different grammati- 

cal properties 

Eg. drink noun/verb; round adjective/noun/verb/adverb/preposition; hang, 
past tense hung or hanged 

« Same written form, different spoken form (heteronyms) 
Eg. bass /bets/, /bees/ 

¢ Same spoken form, different capitalization 
E.g. May/may; Swede (nationality) / swede (Swedish turnip, rutabaga) 


Thus the distinction between polysemy and homonymy may be made on diachronic, 
synchronic, or formal (including grammatical) bases, but in many cases the different 
criteria conflict with each other or are otherwise unstable. That is, although polysemy 
and homonymy are in principle distinct phenomena, when it comes to specific cases 
of meaning variation, the categories cannot be distinguished in a clear-cut, unequivo- 
cal manner. Some theoretical lexical semanticists have proposed that the distinction 
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between polysemy and homonymy should be viewed as a matter of degree (e.g. Cowie 
2001; Tuggy 1993). This means that particular sets of senses can be treated as falling 
somewhere on a continuum between polysemy and homonymy—for example, senses 
that are historically but not semantically related or senses that belong to different parts 
of speech but are otherwise semantically and etymologically related. 

The polysemy/homonymy continuum notion is, however, not very relevant to 
the business of dictionary making. For lexicographic representation the distinction 
between polysemy and homonymy remains essentially dichotomous: the lexicographer 
has to make a definite decision over whether particular senses are listed under the same 
entry or separated into different lexical items (Robins 1987). There exists, however, a 
range of specific strategies and representational conventions for dealing with lexical 
forms with multiple senses (see Malakhovski 1987 for some discussion). Table 28.1 pro- 
vides a simplified illustration of four basic patterns with some senses of tattoo. As the 
table shows, the basic patterns are also subject to variations: the ordering and number- 
ing of the headwords or senses within a single entry may either prioritize listing gram- 
matically similar senses or entries contiguously or group semantically similar senses/ 
entries together. 

Of these patterns, Malakovski (1987: 47) sees the first, grouping all senses under a sin- 
gle entry, as ‘the most vulnerable from the theoretical point of view, given that it does not 
differentiate between lexical phenomena (homonymy and polysemy) or types of lexical 
units (grammatical words or ‘hyperlexemes’). However, some lexicographers, including 
Tarp (2001), have questioned to what extent lexicography should be informed by theo- 
retical linguistics. Tarp maintains that the practical problems of lexicography cannot be 
solved through theoretical linguistics. Instead, decisions such as the number of head- 
words to include in a dictionary should be made solely on the basis of practical consid- 
erations, specifically the functions of the dictionary and the needs of the intended user. 
He suggests, for example, that at least in some types of dictionaries it would be appropri- 
ate to split senses into homonymous entries only in cases where a single entry would be 
too long. According to him, explicit information about the distinction between polyse- 
mes and homonyms is irrelevant for the user who just refers to the dictionary for help 
with producing or understanding language (e.g. checking the spelling of a word or look- 
ing up the meaning of an unknown word). He also argues that representing homonyms 
and polysemes differently is potentially confusing and may even hinder the user's access 
to the information they seek. However, one could counter this argument by noting that 
grouping semantically distinct senses under different headwords might well help the 
user’s search for the correct meaning. For example, a user looking for the ‘military drum 
signal’ sense of tattoo may well appreciate it being listed under a separate headword 
from the unrelated ‘ink mark on skin’ sense. 

Ultimately, however, the design of the macrostructure of a dictionary needs to be 
motivated by the usability and accessibility of the information for the intended user and 
practical considerations such as the space allocated for the entries. The remainder of this 
chapter provides a review of different types of dictionaries and the strategies they use in 
identifying and representing homonyms. 
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28.3 HOMONYMS IN DIFFERENT TYPES OF 
DICTIONARIES 


Terre Tener rrr every rrr on Tare Trreri rir rte rrri rrr rir rrrrrrert rr rrrritttrrririre nitrite iii ey 


Types of dictionaries can be distinguished on the basis of a variety of criteria (see e.g. 
Malkiel 1967; Zgusta 1971; Landau 2001; Swanepoel 2003). Here I consider dictionaries 
that differ with respect to the range of vocabulary they cover, their intended users, and 
their size—all of these factors may influence how a dictionary identifies homonyms, All 
dictionaries reviewed here are print monolingual English dictionaries, most published 
in the UK (for reasons of space, dictionaries of other languages or bilingual dictionaries 
were excluded from this review). The dictionaries surveyed are generally synchronic in 
their perspective. 

I begin in Section 28.3.1 by examining general-purpose dictionaries of different sizes 
aimed at native speakers of English; smaller concise and pocket-size dictionaries are 
discussed in Section 28.3.1.1. Sections 28.3.2 and 28.3.3 then cover monolingual learn- 
ers’ dictionaries and children’s dictionaries, respectively. Finally, in Section 28.3.4 I con- 
sider the representation of homonyms in specialist dictionaries, including single-field 
dictionaries and special-purpose dictionaries—-particularly slang and pronunciation 
dictionaries. 


28.3.1 General-purpose Dictionaries 


A general-purpose dictionary aims to record the general vocabulary of a language 
(including function words), the standard spellings, and usual pronunciations of words, 
and the established meanings and grammatical properties of words (see e.g. Béjoint 
2000 for discussion). According to Tarp (2001), general-purpose monolingual dictionar- 
ies are ‘polyfunctional; insofar as they simultaneously cover both communication- and 
knowledge-oriented functions. That is, their purpose is to help users with the practical 
task of producing or comprehending language but they also represent information about 
the subject matter of language, such as the origins of words. This is perhaps reflected in 
the fact that in all eight larger (unabridged and desk-size) general-purpose dictionar- 
ies reviewed here homonyms are generally identified on historical grounds (American 
Heritage Dictionary of the English Language, fourth edition [2000]; Bloomsbury English 
Dictionary [2004] = Bloomsbury; Chambers Dictionary [2003] = Chambers; Collins 
English Dictionary, ninth edition [2007] = CED9; DK Mlustrated Oxford Dictionary 
[2003] = DK; Oxford Dictionary of English [2005] = ODE; Random House Dictionary of 
the English Language, second edition [1987] = RH2; Websters Third New International 
Dictionary [1961] = W3). Most of these dictionaries group historically related senses 
belonging to different parts of speech within the same (grammatically structured) entry 
(pattern 3 in Table 28.1). The only exception to this is W3, which, in addition to splitting 
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‘junk \jank\ 7 -s often attrib [ME jonke] 1 a abs :a piece of worn or poor rope or cable [...] 2 a(1): 
old iron, glass, paper, cordage, or other waste that may be treated so as to be used again in some form 
[. ..] (2) secondhand, worn, or discarded articles of any kind having little or no commercial value [.. .] 
2junk \“\ vt-ed/-ing/s : to abandon or get rid of as no longer of value or use [...] 

3junk \"\ 2 -s [Pg junco, fr Jav jon ] : any of various characteristic boats of Chinese and neighboring 
waters [.. .] 


FIGURE 28.1 An excerpt from the entries for junkin W3 


club! > noun 1. [treated a sing. or pl.] an association dedicated to a particular interest of activity [...] 


origin: early 17" century (as a verb): formed obscurely from cLus? 


club? ® noun 1.A heavy stick with a thick end, used as a weapon, [...] 
origin: Middle English: from Old Norse c/ubba, variant of kiumba; related to cLumP 


FIGURE 28.2 Clubin the ODE 


historical homonyms, also separates senses into different headwords on the basis of part- 
of-speech differences (pattern 4)—see Figure 28.1. 

Although in the other dictionaries grammatical differences do not generally warrant 
the creation of separate headwords, Bloomsbury does separate the open-class senses 
of round (as an adjective, noun, or verb) under one headword, and the grammatical 
closed-class word senses of the form (as a preposition or adverb) under another. Senses 
with different morphological forms (as in the case of hang) are, however, not given sepa- 
rate headword status in any of the dictionaries. 

As noted above, clearly etymologically distinct lexical units (e.g. junk, bat, calf) are 
represented as homonyms in all the larger general-purpose dictionaries surveyed. Not 
all the dictionaries, however, prioritize etymological criteria in all cases. While the other 
dictionaries group the historically related ‘heavy stick and ‘a group of people’ senses 
of club under the same headword, ODE separates them into different entries, on the 
grounds of their semantic distinctiveness, as Figure 28.2 shows. 

However, in cases where a possible synchronic semantic connection exists between 
etymologically unconnected senses, all the dictionaries defer to the etymology. For 
example, the ‘scrub’ and ‘search’ senses of scour are separated under different homony- 
mous headwords in all the larger general-purpose dictionaries. 

In all these dictionaries, heteronyms like bass or wind are given different headwords. 
The treatment of capitalization differences, however, varies. W3 is well known for rep- 
resenting most headwords—apart from God—with a lower case initial (e.g. Landau 
2009a). This includes nationalities and names of months, whose definitions include 
notes such as usu cap to indicate that the headword is more commonly capitalized. W3 
also groups, for instance, Swede and swede under the same headword. A similar practice 
is followed by DK, RH2, and Chambers (see Figure 28.3). 
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Swede /swéa/ na native or citizen of Sweden in N Europe; (without cap) a Swedish turnip, a large 


buff-flowered, glaucous-feaved kind with yellow flesh. 


FIGURE 28.3 Swede/swedein Chambers 


Similarly, these dictionaries group the ‘hawthorn blossom’ sense of may under the 
same headword as May the month (although the modal verb may is separated). In this 
the dictionaries appear to be prioritizing etymological relatedness over the differences 
in capitalization (swede the vegetable is from Sweden and may the blossom is named 
after the month). However, some inconsistencies can be noted: W3 and RH2 have sepa- 
rate entries for Turkey and turkey, even though the name of the bird is ultimately derived 
from the name of the country. 


28.3.1.1 Smaller General-purpose Dictionaries 


Given that a dictionary’s homonym policy affects the number of headwords it has, crea- 
tors of smaller dictionaries have a motivation for reducing the number of homonyms, 
However, none of the five smaller dictionaries surveyed here eschewed homonymous 
headwords altogether. In the dictionaries labelled ‘concise’ (Concise Oxford English 
Dictionary, eleventh edition [2004] = COEDu1 and Collins Concise English Dictionary, 
seventh edition[2008]]), the division of headwords essentially mirrors that of their larger 
cousins (ODE and CEDg), as both generally identified homonyms on etymological 
grounds. In the pocket-size dictionaries, too, senses of different origins are generally 
represented in different entries—for example, bat, calf, and junk are homonyms in all 
three smaller dictionaries surveyed (Collins English Dictionary Express [2005], Collins 
Gem Dictionary, 15th edition[2009] and Little Oxford English Dictionary, ninth edi- 
tion [2006] = Little Oxford). However, in some cases the space restrictions have led the 
smaller dictionaries to reduce homonymous headwords. For instance, all five include 
the ‘hawthorn blossom sense of may under the same headword as May the month. Little 
Oxford also groups some historically distinct senses together. For example, the ‘scrub’ 
and ‘search’ senses of scour are represented under the same headword in this diction- 
ary, but are distinct homonyms in the larger Oxford dictionaries (ODE and COED11); 
similarly for the ‘hearing orgam’ and ‘head of corn’ senses of ear. In these cases the deci- 
sion to group the senses under the same headword can be justified on the grounds of 
the potential semantic relationship between the senses, but it is not clear why the same 
dictionary also groups the entirely unrelated senses of jumper (‘someone who jumps’ / 
‘pullover’/‘pinafore dress’) in one entry. 


28.3.2 Monolingual Learners’ Dictionaries 


In contrast with the general-purpose dictionaries, historical homonymy is not consid- 
ered relevant in learners’ dictionaries. Learners’ dictionaries often make use of inno- 
vative lexicographic features targeted at the learners’ special needs, such as modified 
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definitional styles (see e.g. Heuberger, this volume; Moon, this volume). Many also 
avoid homonymous headwords, as the distinction between polysemy and homonymy is 
generally considered to be unimportant for the learner user (Atkins and Rundell 2008). 

The survey here considers four learners’ dictionaries (Cobuild; Collins Cobuild 
Advanced Dictionary, sixth edition [2008] = Cobuild6; Longman Dictionary of 
Contemporary English, fifth edition[2009] = LDOCEs, and Oxford Advanced Learners 
Dictionary, seventh edition [2005] = OALD7). The only one of these to systematically 
include homonymous headwords is LDOCEs, which adopts a purely grammatical crite- 
rion for the identification of homonyms, ignoring historical or semantic relatedness or 
distinctiveness (i.e. pattern 2 in Table 28.1). Thus bank, for example, has two numbered 
entries in LDOCEs, one containing all the noun senses (including the ‘financial institu- 
tion’ and ‘mound of earth’ senses), and the other all the verb senses (including the ‘put 
money in a bank and the ‘arrange something in a pile or row’ senses). 

The other three learners’ dictionaries generally avoid creating distinct entries for 
senses that belong to different parts of speech. Nor do any of them systematically iden- 
tify homonyms on the grounds of etymological or semantic distinctiveness—all treat 
the different senses of tattoo, calf, and ear, for instance, in one entry. As noted above in 
Section 28.2.3, the dictionary to go furthest in its avoidance of homonyms is Cobuild1, 
which has no homonymous headwords whatsoever (i.e. it always follows pattern 1). 
The more recent Cobuildé has, however, revised this policy somewhat. Although it 
also generally favours grouping all senses of a lexical form under a single headword, 
it does divide the senses of bank into three entries, labelled 1: FINANCE AND STORAGE, 
2: AREAS AND MASSES, 3: OTHER VERB USES, It is not clear, however, why other equally 
semantically divergent cases (e.g. bat, calf, tattoo, or club) are not similarly represented 
as homonyms. In some cases Cobuild6 also divides senses into different headwords on 
grammatical grounds. There are four headwords for round, labelled 1: PREPOSITION 
AND ADVERB USES, 2: NOUN USES, 3: ADJECTIVE USES, 4: VERB USES. Again, however, 
the basis for representing some lexical units as grammatical homonyms but not others is 
not entirely clear. 

Cobuild6 has also abandoned Cobuildi’s policy of grouping even heteronyms and 
cases such as may/May and swede/Swede under a single headword. In Cobuild6, differ- 
ently capitalized words are given separate headword status. The dictionary also sepa- 
rates some heteronyms into different entries (incl. wind and live), but the /bets/ and 
/bes/ pronunciations of bass are still covered in one entry. The other learners’ dictionar- 
ies are more systematic in separating heteronyms and words with differences in capitali- 
zation into different headwords. OALD7, for instance, has separate entries for may and 
May, although the entry for may includes both the modal verb senses and the unrelated 
‘hawthorn blossom’ noun sense. This is in keeping with the dictionary’s policy of not 
using etymology or semantic distinctiveness as a criterion for homonymy. One could, 
however, argue that grouping highly semantically distinct senses, such as those of may, 
under the same entry may be confusing for the learner. It can make the lexicon of the 
language appear arbitrary for the non-native user whose first language would likely have 
different words for the different senses. Separating the senses into different entries could 
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be helpful, as it would show that meaning-wise (and by etymological and grammatical 
criteria, too) may is actually two English words. 


28.3.3 Children’s Dictionaries 


According to Malakhovski (1987: 40), children’s dictionaries commonly simplify the 
lexical structure of a language and often avoid including homonymous entries ‘since 
it is generally believed that to introduce the notion of homonymy to young children is 
inappropriate. However, the evidence regarding children's understanding of homon- 
ymy is mixed. There is some evidence that children can struggle with the notion of 
homonymy and that the awareness of homonyms develops gradually with age. For 
example, Whitt and Prentiss (1977) found that 7-year-olds’ comprehension of jokes 
that depend on homonyms was significantly worse than that of g-year-olds. However, 
other studies have shown that some degree of metalinguistic awareness of homo- 
nyms develops quite early. Backscheider and Gelman (1995) showed that children as 
young as 3 are able to pick out pictures of two different kinds of bat and can recognize 
that sharing the same name does not mean that the things are the same kind of thing. 
According to Kohn and Landau (1990), parents often describe the referents of homo- 
nyms in different ways (e.g. It a flying bat vs. baseball bat), which may help children 
develop an implicit understanding of homonymy. Although recognizing homonyms 
is not the same kind of task as looking them up ina dictionary, the fact that even young 
children can have an understanding of homonyms suggests that it would not neces- 
sarily be inappropriate for a children’s dictionary to include homonymous headwords. 

An examination of six children’s dictionaries published in the UK showed a tendency 
to avoid homonyms, particularly in dictionaries aimed at the youngest age bracket, 
but the specific policies varied between publishers and dictionaries. The dictionar- 
ies surveyed were Collins Junior Dictionary (2005) = Collins Junior; Oxford Children's 
Colour Dictionary (2006) = Oxford Children’s; Collins School Dictionary, fourth edition 
(2009) = Collins School; Oxford Primary Dictionary (2001) = Oxford Primary; Collins 
Student’s Dictionary (2004) = Collins Student's, and Oxford School Dictionary, fourth 
edition (2005) = Oxford School. Two of the dictionaries were aimed at children from age 
7 upwards (Collins Junior and Oxford Children’s). Neither separates senses into homo- 
nyms on etymological or semantic grounds—for instance, the different senses of calf are 
given under one headword in both dictionaries. But while Collins Junior avoids homon- 
ymous headwords altogether (following pattern 1 in Table 28.1), Oxford Children’s 
dictionary does separate different grammatical classes into different headwords (i.e. 
pattern 2). However, the headwords for the different parts of speech are not numbered 
with super- or subscript numbers. On the other hand, in the other Oxford dictionaries 
aimed at older children (Oxford Primary for ages 8-11 and Oxford School for ages 11-16) 
superscript numbers are used to indicate homonyms that have a different meaning or 
origin. For example, in Oxford Primary, junk! includes the ‘rubbish’ sense, while the 
‘flat-bottomed boat’ sense is given under junk? But as both Oxford Primary and Oxford 
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bat! > noun (pluro/ bats) 

a wooden implement used to hit the ball in cricket, baseball, and other games [. . .J 
bat > verb (bats, batting, batted) 

to bat is to use a bat ina ball game 


bat? noun (plura bats) 


a flying mammal that looks like a mouse with wings 


FIGURE 28.4 batin Oxford Primary 


School also maintain the convention of listing different parts of speech under separate 
unnumbered headwords, these dictionaries end up with some rather complex combina- 
tions of types of headwords—see Figure 28.4. 

While Oxford Primary has different numbered headwords for some historical homo- 
nyms, it does not do so consistently. For bank, for instance, historically unrelated senses 
are grouped together, and senses are only separated on grammatical grounds, with the 
noun and verb senses appearing under separate unnumbered headwords. In contrast, 
Collins School, which is aimed at children of comparable age (ages 10+), does not sepa- 
rate any homonyms on etymological, semantic, or grammatical grounds (although it does 
generally do so for heteronyms). The Collins Students Dictionary, which is aimed at the 
oldest age group (14+), on the other hand, identifies homonyms on etymological grounds, 
largely mirroring the policy of the general-purpose dictionaries published by Collins. 


28.3.4 Specialist Dictionaries 


Under the label of specialist dictionaries I consider dictionaries restricted to the vocab- 
ulary of a specific subject field, such as psychology or music (see Becker, this volume), 
and also dictionaries that cover a specific aspect of language, in particular pronuncia- 
tion dictionaries (see also Sangster, this volume) and dictionaries of slang (see Coleman, 
this volume).! 

The question of how to deal with historical or semantic homonyms comes up less 
frequently in single-field or technical dictionaries than in other types of dictionaries 
simply due to their restricted coverage of vocabulary. There is no need for Bloomsbury 
Dictionary of Music (1992) to mention any fish-related senses of bass or for the Concise 
Oxford Dictionary of Music (2004) to include the ‘ink mark sense of tattoo in addition 
to the definition ‘the music of bugles and drums, recalling soldiers to their barracks at 
night’. Technical dictionaries also often avoid dealing with part-of-speech differences 
by only including nominal definitions even if the word is also used as a verb. For exam- 
ple, Concise Oxford Dictionary of Music has three noun senses for slide, but no verb 


1 See McClure, this volume, for discussion of homonymy in surname dictionaries. 
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senses. Occasionally, however, a technical dictionary finds cause to include unrelated 
senses for a given wordform, but the general tendency in such dictionaries is to group all 
senses under the same headword. For instance, the entry for race in Oxford Dictionary 
of Psychology (2001) includes both the ‘competition’ and ‘subdivision of a species’ senses. 
In contrast, Longman Dictionary of Language Teaching and Applied Linguistics (2002) 
lists each sense of a wordform in its own homonymous lexical entry, regardless of 
whether they are related or not (e.g. accent appears as four numbered headwords). Both 
of these representational formats obscure the polysemy/homonymy distinction, but this 
is appropriate given the intended function of a single-field dictionary. As Tarp (2001) 
notes, the users of a technical dictionary wish to increase their knowledge of the ter- 
minology and concepts of the subject, and thus information about whether particular 
senses are related or not is irrelevant. 

The polysemy/homonymy distinction is also generally not relevant for dictionaries 
restricted to representing a specific aspect of language. Pronouncing dictionaries only 
give the pronunciations of words, not their meanings or etymologies. Consequently, 
they only need to provide different entries for heteronyms, otherwise each wordform 
occurs only once, as illustrated by the entries for bass and bassist from Cambridge 
English Pronouncing Dictionary (2011) in Figure 28.5 (very similar representations of 
these wordforms are given in Oxford Dictionary of Pronunciation for Current English, 
2001 and the Longman Pronunciation Dictionary, 1990). 

In slang dictionaries, etymologically unrelated senses are also often grouped under 
the same entry. For example, as Figure. 28.6 shows, the entry for bat in The Concise New 
Partridge Dictionary of Slang and Unconventional English (2008) includes both senses 
derived from the ‘flying mammal’ sense (e.g. senses 1, 2) and those derived from the 
‘stick’ sense (e.g. 5, 8). 

However, separating slang senses into homonymous entries on etymological grounds 
can be difficult, given that the origins of slang are often obscure (consider e.g. the 


bass (B) fish, fibre, beer: bzes 
bass in music: bers ~es =12 [...] 


bassist ‘ber.sist -s -s 


FIGURE 28.5 Bassand bassistin Cambridge English Pronouncing Dictionary 


bat noun 1 foolish or eccentric person us, 1894. 2 an ugly woman us, 1972. 3 an extended period of 
drunkenness canapa, 1977. 4a drinking binge us, 1846.5 a fat marijuana cigarette. Pun on ‘baseball 
bat as stick us, 1975. 6 a shoe; a slipper. Variant spellings are ‘batt’ and ‘bate’ ux, 1992.7 male 
masturbation ausrrai, 1985. 8 in horse racing, the whip used by the jockey us, 1957. [...] 


bat verb to dance ona stage. Also spelled ‘batt’ or ‘bate’ ux, 2002. [...] 


FIGURE 28.6 Satin Concise New Partridge Dictionary of Slang 
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‘drinking binge’ sense of bat). One slang dictionary that does use etymology as a cri- 
terion for homonymy is Green's Dictionary of Slang (2010), which is a slang dictionary 
written on historical principles. It consequently lists the ‘foolish person’ sense under 
bat n.’ and the ‘riding whip’ sense under bat n.” The dictionary similarly separates the 
‘moustache’ sense of stash (short for moustache) under a different headword from the 
‘cache’ sense of the form, which is of obscure origin. 

Senses belonging to different parts of speech are given under different headwords in 
both the Concise New Partridge Dictionary of Slang and in Green's Dictionary of Slang. 
The same policy is also followed by the Dictionary of Contemporary Slang (1997), which 
has separate entries for bad as an adjective meaning ‘good’ and as a noun meaning ‘a 
mistake’ (as in my bad). 


28.4 CONCLUSION 


A dictionary’s policy regarding the identification and representation of homonymy 
depends on the intended audience and range of the dictionary in question. The distinc- 
tion between polysemy and homonymy is generally not relevant for special-purpose 
dictionaries, but continues to inform the headword divisions in general-purpose dic- 
tionaries of all sizes. This is in keeping with lexicographic tradition, but also relates to 
general-purpose dictionaries aim of providing a comprehensive description of the lexi- 
cal structure of the language. The main criterion for homonym identification adopted in 
the general-purpose dictionaries reviewed here was etymology. In some cases, however, 
semantic distinctiveness or similarity were prioritized over etymological connections 
or lack thereof (e.g. the ODE representation of club or the Little Oxford's treatment of 
scour). 

Some of the dictionaries aimed at children and language learners, on the other hand, 
were shown to have a tendency to avoid homonymous headwords and instead group 
unrelated senses under a single headword. This can be viewed as a justifiable simpli- 
fication, appropriate for a dictionary aimed at users whose language skills are still 
developing. However, as noted, there is some evidence that children may have a more 
sophisticated understanding of homonyms than is sometimes assumed. Furthermore, 
the fact that the more recent Cobuild6 learner’s dictionary has revised Cobuildt's princi- 
ple of total homonym avoidance suggests that this policy was not found to be helpful for 
the learner users. 

Thus, although the distinction between polysemy and homonymy is not always 
clear-cut and can be drawn on the basis of a variety of different criteria, homonymy con- 
tinues to be a significant feature in at least some types of dictionaries, 
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Yonge sent me word that it should be pronounced so as to rhyme to seat, and that 
none but an Irishman would pronounce it grait. Now here were two men of the high- 
est rank, the one, the best speaker in the House of Lords, the other, the best speaker in 
the House of Commons, differing entirely.’ 


For many European languages, for example Czech, Finnish, Hungarian, Slovak, or 
Spanish, the problem does not present itself in this form, since the vast majority of 
spellings are immediately pronounceable without any difficulty (although the precise 
realization may differ in different varieties of the language). At first sight, English is 
notoriously chaotic from this point of view: homographs may be differently pronounced 
(e.g. lead (verb = ‘conduct’ and lead (noun: a metal), while homophones may be differ- 
ently spelt (e.g. weigh and way). This is particularly noticeable in vowel sound~spelling 
equivalences, but is also found, although less commonly, for consonants: thigh (<th> = 
/0/)~thy (<th> = /8/)~ Thai (<Th> = /t/), few~phew, room~rheum. 

The exchange between James Boswell and Samuel Johnson gives us two of the prob- 
lems facing dictionary writers when they come to consider whether or not to include 
pronunciation information: Who decides on the ‘correct’ pronunciation ofa word? And 
how does that person, however chosen, make a decision on what the pronunciation 
model should be? Johnson does not consider the alternative solution of showing more 
than one pronunciation for any word, but simply decides that he should not be the arbi- 
ter, and so shows none. 

A question not addressed by Johnson and Boswell in this passage is how to represent 
the pronunciation when once it has been decided upon. This chapter will examine some 
of the methods that lexicographers have tried—with greater or less success—to solve 
this problem taking as its example general dictionaries of English published between the 
mid-eighteenth century and the present day. 


29.1.1 The Problem of English Orthography 


The need for some sort of pronunciation advice arises of course from the nature of 
English orthography. Unlike many other languages, English has spelling that has grown 
up like a layer cake: the base is Old English, mostly the Wessex dialect, to which have 
been added some Scandinavian (Viking) traditions; borrowings from Norman French, 
which had its own spelling conventions—sometimes at odds with the Old English; 
then the influence of French spelling conventions on native Anglo-Saxon words (e.g. 
<ou> for /u:/ as in house for older hus); followed by words borrowed from all sorts of 
languages, usually retaining the spelling where that had been in a roman alphabet (not 
always—jaunty, for instance, is borrowed from French ‘gentil’), but generally attempting 
to imitate the pronunciation of the donor language. This was compounded by the Great 
Vowel Shift (GVS), which changed the pronunciation of many existing English words 


' Boswell (1791 [1968 I, 405-6], for the year 1772). 
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29.1 INTRODUCTION: THE PROBLEM 


Tus chapter considers the problems posed by representing pronunciation in dictionar- 
ies intended for general readers, concentrating on English, as a rather notorious exam- 
ple of a language in which there is no simple letter-to-sound mapping from the standard 
orthographic forms of words to their pronunciation, and therefore some guidance on 
the pronunciation of each word is normally expected by general readers. Some of the 
resultant problems are well illustrated by the following passage from Boswell’s Life of 
Samuel Johnson: 


BOSWELL. ‘It may be of use, Sir, to have a Dictionary to ascertain the pronunciation? 

JOHNSON. ‘Why, Sir, my Dictionary shews you the accents of words, if you can 
but remember them: 

BOSWELL. ‘But, Sir, we want marks to ascertain the pronunciation of the vowels. 
Sheridan, I believe, has finished such a work? 

JOHNSON. ‘Why, Sir, consider how much easier it is to learn a language by 
the ear, than by any marks, Sheridan's Dictionary may do very well; but you can- 
not always carry it about with you: and, when you want the word, you have not the 
Dictionary. It is like a man who has a sword that will not draw. It is an admirable 
sword, to be sure: but while your enemy is cutting your throat, you are unable to use 
it. Besides, Sir, what entitles Sheridan to fix the pronunciation of English? He has, in 
the first place, the disadvantage of being an Irishman: and ifhe says he will fix it after 
the example of the best company, why they differ among themselves. I remember an 
instance: when I published the Plan for my Dictionary, Lord Chesterfield told me 
that the word great should be pronounced so as to rhyme to state; and Sir William 
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without the spelling following suit (understandably, as the speakers would be unaware 
that their pronunciation was changing), with subsequent borrowings using the same 
spelling conventions but now with different sound values, for example doubt (earlier 
doute) borrowed from French before the GVS, and now pronounced /daut/, but route, 
also borrowed from French, but after the GVS, pronounced /ru:t/. Contrast this with 
the practice adopted by a language such as Spanish, which changes the spellings of bor- 
rowed words to conform to Spanish norms, for example futbol for ‘footbal? (compare 
French, which has retained the English spelling football). The consequence of these 
developments in English is that the same spellings can be pronounced in various ways, 
and that different spellings can be pronounced identically. 


29.1.2 The Problem of Stress in English 


Spanish has quite sensibly opted to indicate stress position by placing an acute accent 
above the vowel of the stressed syllable where this is different from some very simple 
rules of thumb, e.g. cdntara (‘water jug’)—cantara (penultimate stress ‘he or she would 
sing’)—cantard (‘he or she will sing’). English has chosen not to show stress placement 
in any way. An additional problem that English causes in this area, which does not 
occur in Spanish and many other languages, is that unstressed vowels are frequently 
reduced either to schwa or to a close-ish vowel, /1~i/ or /U~u/, so that related words 
with different stress patterns retain the same basic spelling, for example photograph 
/' fovtogra:f/~photographer /fa'tografa(r)/~ photographic / foute’ greefik/~photogravure 
/ fautegra'vjua(r)/, thus giving the reader no clue to the pronunciation. 

Stress in English is often said to be ‘fixed and free, by which linguists mean that 
although the position of the main stress of a word is not fixed at the same point in every 
word (e.g. invariably on the first syllable, as in Czech, Hungarian, or Finnish, or the 
penultimate, as in Polish), it is fixed for each individual word, so that we cannot shift 
the stress around at a whim, unlike French, which has no word stress, and in which any 
syllable may be stressed for expressive reasons. For most words at any one time, this is 
true, but there are constantly many words whose stress pattern is in a state of flux, and so 
uncertain.” Additionally, English does allow the shifting of stress for contrastive empha- 
sis, so that it is permissible to say, for instance ‘I meant revenge, not avenge. In Spanish, 
which truly has fixed and free stress, this is not permitted. Consequently, many pairs of 
words, traditionally spelt the same but distinguished by their stress placement, can eas- 
ily become confused. 


2 One of the most frequent complaints to the BBC about the use of English in broadcasting has 
always been about the supposed mis-stressing of words, of which controversy is one of the most famous 
examples (while originally listeners and viewers complained that second syllable stress was ‘wrong, they 
are now equally likely to complain that stress on the first syllable is wrong). 
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29.1.3. The Problem of the Model 


English is spoken in many varieties, dialects, and accents. Which, if any, of these should 
lexicographers base their pronunciations on? Should they choose a single model, or 
should they give a range of pronunciations and try to cover the waterfront? 

In the eighteenth century standards in pronunciation were becoming increas- 
ingly important from a social point of view. Boswell reports on Sheridan’s having 
taught pronunciation to Mr Alexander Wedderburn ‘and though it was too late in 
life for a Caledonian to acquire the genuine English cadence, yet so successful were 
Mr Wedderburn’s instructors, and his own unabating endeavours, that he got rid 
of the coarse part of his Scotch accent, retaining only as much of the “native wood- 
note wild’, as to mark his country; which, if any Scotchman should affect to forget, I 
should heartily despise him’ (1791 [1968: I, 242-3], for the year 1763). Johnson always 
spoke with a Staffordshire accent, and his friend the actor David Garrick (who orig- 
inally travelled to London with Johnson from Lichfield) joked about it.? However, 
Johnson evidently felt no need to change his usage, and while he commented to 
Boswell that Sheridan had ‘the disadvantage of being an Irishman, Sheridan’s accent 
did not disqualify him from the best society either—and neither did Boswell’s 
Scottish accent. Clearly a wide variety of pronunciations was in use among well- 
educated people, and so Johnson could be justified in not wishing to choose among 
them. The later eighteenth-century dictionary writers, or orthoépists, were not so 
modest, and Sheridan, Walker, and others happily offered a single pronunciation for 
each word, 

The American dictionary writer Joseph Emerson Worcester, on the other hand, 
writing in 1847, admitted to variation among the ‘best’ authorities: ‘The number of 
English words respecting the pronunciation of which there is any important differ- 
ence, may be stated at about 2000... .. There is much difference in the pronuncia- 
tion of many of these words, both among the best orthoépists, and among the 
best speakers of the language’ (Worcester, A Dictionary of the English Language 
1848: 5.) However, Worcester omits to tell us who he considers are ‘the best speak- 
ers of the language’ 

In the course of the nineteenth century, attitudes towards regional accents hardened, 
and one English pronunciation came to be seen asthe ‘best, although with some regional 
idiosyncracies: ‘In the present day we may, however, recognize a received pronuncia- 
tion all over the country, not widely differing in any particular locality, and admitting 
a certain degree of variety. It may be especially considered as the educated pronuncia- 
tion of the metropolis, of the court, the pulpit, and the bar’ (Ellis 1869: 23). (This seems, 
incidentally, to be the earliest use of the phrase ‘received pronunciation.) James Murray 


* Johnson himself never got entirely free of his provincial accent. Garrick sometimes used to take him 
off, squeezing a lemon into a punch-bowl, with uncouth gesticulations, looking round the company, and 
calling out “Who's for poonsh?”’ (Boswell 1791 [1968: II, 44], for the year 1776). 
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wrestled with the problem of a model as editor of the Oxford English Dictionary in the 
late nineteenth century, but MacMahon came to the conclusion that ‘On the available 
evidence, both published and unpublished, there is no means of determining precisely 
how many and which accents of English are subsumed in the notation and ‘it is evident 
that Murray was unwilling to set up any one idiolect or accent of educated English as his 
model (MacMahon 198s: 78, 80). 

At the present day, there is once again far more acceptance of a range of accents in 
English. It is no longer unusual to hear Scottish, Irish, or overseas accents of English 
from radio and television news broadcasters, where once RP reigned supreme. Public 
attitudes to non-RP have changed even since the 1980s, when a newsreader on BBC 
national radio with a Scottish accent could receive virulent hate mail simply on the 
grounds of accent. If dictionaries are to be comprehensive, they need to reflect this 
change, and show these differences in pronunciation. 


29.1.4 The Problem of the Representation 
of Pronunciation 


Most of the consonant letters and groups used in English cause little problem: <b, h, 
j,k, 1, m, p, v, w, sh> always represent the same ‘sound when they occur alone, except 
when they are silent (as in debt, heir, knock, half, receipt, wry). However, there are more 
consonant phonemes than letters to represent them, so some solution has to be found to 
augment the number of characters, or character groups. 

Vowels are a completely different matter: we have five letters: <a, e, i, 0, u>, or six if we 
include <y>, but whichever variety or accent of English we choose, there are far more 
vowel phonemes than six. Many digraphs and trigraphs are therefore used to supple- 
ment the individual letters. 

Each symbol, or group of symbols, chosen by the lexicographer to represent pro- 
nunciation should represent only one phoneme, in order for there to be no ambiguity 
over pronunciation. However, is it also necessary that every phoneme be represented 
by its own unique symbol—in other words, is it permissible to represent the same 
phoneme by more than one symbol or group of symbols? This is not a trivial ques- 
tion: English orthography, particularly for vowels, has many letters and groups of 
letters that may be interpreted in more than one way: <a>, for instance, may be /z/ 
(man), /e/ (many), /1/ (image), /a:/ (father), /p/ (what), /o:/ (all), /e1/ (mate), /eo/ 
(mare), or /a/ (among). Each of these needs its own symbol, and <a> must represent 
only one. But conversely, many English vowel phonemes are represented by multiple 
spellings: /i:/ appears as <ze> or <ae> (Cesar, Caesar), <e> (me), <ea> (beam), <ee> 
(meet), <ei> (receive), <eo> (people), <i> (machine), <ie> (fiend), <oe> (oestrogen). Is 
it necessary to choose a single symbol to replace all these spellings, or can more than 
one be used (always provided that it is not simultaneously used to represent another 
phoneme)? 
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29.2 THE SOLUTION 


29.2.1 Impressionistic Attempts 


In the early days of dictionary writing, there was no objective phonetic alphabet avail- 
able to the lexicographers, and they were forced either to invent a whole new alphabet 
for themselves, as Spence did (see Section 29.2.2.1), or to make the best job they could 
of the roman alphabet, by respelling words in a regular fashion. Sheridan (A General 
Dictionary of the English Language, 1780) explicitly rejects the idea of reforming the 
alphabet: ‘no alterations in that respect, productive of any real benefit, can be made, 
without new moulding our alphabet, and making a considerable addition to its charac- 
ters; a point utterly impracticable’ (1780: xvii). Was he aware of Spence’s attempt in this 
direction? In the twentieth and twenty-first centuries, once phonetic alphabets were 
available, some lexicographers still preferred to use respelling, claiming that it was 
easier for their readers. As recently as 2011, we find the editor of Chambers Dictionary 
(twelfth edition) writing ‘A respelling system has been used in this dictionary. It is a 
method that is intelligible to people who are not familiar with phonetic symbols, and 
one that allows for more than one interpretation—so that each user of the dictionary 
may choose a pronunciation in keeping with his or her speech’ (2011: xv). This is some- 
what ingenuous: Chambers’ respelling system retains <r> wherever it appears in the 
traditional orthography. This does, in accordance with the editor’s claim, allow users 
to choose to pronounce /r/ if they are rhotic, or to omit it if they are not, so that far- 
ther (respelled as ‘fir' dhor') may be pronounced either /'fatrdar/ or /‘fa:da/. But the 
respelling inventory does not include a single symbol which may be interpreted as 
either /z/ or /a:/, so that the BATH‘ words, which distinguish speakers from the south- 
ern and northern halves of England (northerners using the same vowel as in the TRAP 
set, but southerners the same as in the PALM set), are all shown with their southern 
pronunciation alone: /ba:6/. Interestingly, in the Mid-Century Edition (1952), edited by 
William Geddie, a symbol was provided for an <a> that could be either ‘long’ or ‘short’ 
(Geddie's terms), ‘4, but by 1982, under the editorship of E. M. Kirkpatrick, this had 
been dropped again. 


29.2.1.1 How to Mark Stress 


The method which most lexicographers, from the eighteenth century onwards, have 
chosen to use is an acute accent placed after the stress. Usually it has been placed after 
the stressed syllable, but quite often it has come immediately after the vowel of the 
stressed syllable. So, for instance, we find ‘a’bsence’ (Fenning The Royal English dic- 
tionary 1763, Concise Oxford Dictionary, seventh edition 1981), and ‘ab ‘sence’ (Walker 


4 As defined by Wells (1982). 
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A critical pronouncing dictionary, and expositor of the English language, revised edi- 
tion 1818, Merriam-Webster New International Dictionary, 1934 and Webster's Third 
New International Dictionary, 1961, Chambers’s Twentieth Century Dictionary 1982). 
The earliest Chambers’s Twentieth Century Dictionary placed it even later: ‘abs ‘ence’. 
Some dictionaries have placed the acute accent above the vowel of the stressed syl- 
lable: ‘absence’ (Johnston A Pronouncing and Spelling Dictionary 1772), ‘absens’ 
(respelling used in Wyld The Universal Dictionary of the English Language 1932). 
Chambers English Dictionary (2011 edition), in a nod towards phonetic transcrip- 
tion rather than respelling, has replaced its acute accent by a vertical stroke, exactly 
like the IPA stress mark, but still placed after the stressed syllable rather than before 
it. The Penguin English Dictionary (third edition, 2007), however, does use the IPA 
convention of a vertical stroke placed before the stressed syllable. Other dictionar- 
ies have used typographical devices, such as italics or bold face type to indicate the 
stressed syllable. 


29.2.1.2 Resolving the Vowel Problem 


Boswell asks for ‘marks to ascertain the pronunciation of the vowels. He does not 
want a complete respelling of the word, and he appears to be quite happy with the 
way consonants behave. Many dictionaries have followed Boswell’s wishes: the vow- 
els of headwords have been weighed down with diacritics to show their pronun- 
ciation. The most obvious distinction is to divide vowels first into long and short, 
following Latin, and using the same diacritics as when marking Latin vowels. The 
short vowels of bat, bet, bit, and cot, are straightforward: ‘a, ‘, ‘i, ‘6, but ‘tw’ causes 
a problem: is this the vowel of but, or put? The solution, ever since the two vowels 
split, has been to treat it as the vowel of but. As mentioned above, the Great Vowel 
Shift had played havoc with the pronunciation of the long vowels of English, but 
Latin was still, in the eighteenth and early nineteenth centuries, pronounced as it 
had been for centuries before, that is, with the equivalent English vowels, so Latin 
‘a’, '@, 'T, '6', and 'Q’ were now pronounced /el, it, al, au/ and /ju:/ (the last of these 
was the nearest English speakers could get to the French /y/, which was used for 
the name of the letter rather than the ME /u:/, which had become /au/ as a result 
of the GVS). That accounted for nine of the vowels and diphthongs (/ju:/ is con- 
sidered to be a semi-vowel followed by a vowel, rather than as a diphthong)—and 
still does. Remarkably, although some British dictionaries have gone over to an IPA 
transcription in recent years (e.g. the Oxford range down to the Concise, and Collins 
English Dictionary), there are some very popular and respectable dictionaries that 
still use these diacritics today, in the same way as they were employed 250 years ago 
when they were introduced in the first dictionary to show pronunciations, in James 
Buchanan's Linguae Britannicae Vera Pronunciatio: or A New English Dictionary of 
1757. For example, if we compare the pronunciation of five words in Buchanan and 
in editions of the Concise Oxford Dictionary from 1982 and of Chambers Dictionary 
from 2011, we find: 
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Buchanan (1757) COD(7thedn, 1982) Chambers (2011) 


mate mate mate mat 
menial mé‘nial mé ‘nial méni-al 
mite mite mite mit 
mote mote mote mot 
mute mute mute mut 


The remaining vowels: /a:, 91, U, Ui, A, 2, 3!, aU, 1/ and the diphthongs created by the 
elimination of post-vocalic /r/ in non-rhotic accents have caused more difficulties, and 
have been dealt with variously, according to the whim of the editor of each dictionary. 
Of these /o1/ is the simplest, as the traditional orthography is invariably either <oi> or 
<oy>. There is no spelling of /ais/ that is unambiguous—both <ou> (as in house) and 
<ow> (as in now) may represent other vowels or diphthongs. Likewise /u, u:/ and /A/ 
form a complex group, and /9:/ may be spelt with either an <a> (as in Jaw), or an <o> 
(as in fought). ‘2’ is still used for /ea/ in words such as mare, and similarly @, ‘i, ‘6, and ‘W’ 
in fear, fire, four, and fury. Dictionaries that use this sort of diacritical marking have to 
acknowledge the rise of non-rhotic accents in Britain in their introductory remarks— 
the unpronounced <r> is still written in every headword, and repeated in the pronun- 
ciation advice. To represent the other vowels, editors have had recourse to a variety of 
diacritics: diaeresis, circumflex, inverted breves, diacritics sometimes extending above 
more than one letter, and sometimes below the letter as well as, or instead of, above it, 
and even ‘double-decker’ diacritics—one above the other. Inevitably the hardest vowel 
to represent has been schwa, which in the earliest dictionaries was either ignored com- 
pletely, or equated with the short ‘i? of but: @dika’shiin’ would be a typical respell- 
ing for education. Most occurrences of schwa were assumed to be versions of the short 
vowels, ‘a, &, 6. 

However much lexicographers may want to add the ‘marks’ for ascertaining the pro- 
nunciation of the vowels without respelling the headwords, and so taking up twice as 
much space on the page, they have all had to resort to this in some cases at least. The 
words bury and busy, for instance, cannot be marked to be pronounced /'beri, ‘bizi/ 
without using different symbols for the /e/ and /1/. To create new diacritics for the <u> 
for these special cases would be intolerable. 

Some editors have opted to represent the same phoneme with more than one symbol 
or symbol group: for instance, the digraph <ee> is accepted everywhere as equivalent 
to ® so not requiring a respelling. Similarly, <ay> is treated automatically as being the 
same as ‘A. 

Most commonly, editors have used a combination of diacritics and respellings to 
show pronunciations, but some have preferred the respelling to diacritics, as being 
less alien to English readers. The 2007 edition of the Penguin English Dictionary for 
instance uses no diacritics. It gives strictly non-rhotic transcriptions, and makes 
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use of schwa in the centring diphthongs. The full set of respellings is, with their IPA 
equivalents: 


fii ee fui 00 lef ay fe ia 
Af i fu:/ ooh fat ie /ea/ ed 
fel e taf ou joi oy tual 00a 
lef a fof uh fav/ oh fata/  ie-o 
fa:i ah /3:/ uh fau/ ow f/ave/ owa 
foi o 

lor} aw 


So, we see from this that ‘h’ is used as a length marker in ‘ah, ‘ooh, and ‘oh. As ‘ee’ is used 
for one of the long vowels, ‘aa’ could have been used in place of ‘ah’ ‘ie’ is not intuitive 
for /ai/ (it is pronounced /i/ just as often), and ‘ooh’ looks awkward, although it is dif- 
ficult to see how it could have been avoided, but all the other transcriptions are easily 
understoodby the layman with no phonetic training. Oddly, the transcription of foreign 
words uses IPA symbols, and yet one would think that there would be more need to 
provide an easily legible pronunciation for these words than for any others. The deliber- 
ate use of a non-rhotic model of pronunciation means that the editors cannot claim, as 
Chambers’ editors do, that the respelling allows readers to interpret the pronunciations 
according to their own speech. 

Although the Preface to the Concise Oxford Dictionary, eighth edition (1990) states 
that the IPA is ‘newly adopted in this edition (as in the latest editions of the smaller 
Pocket and Little Oxford Dictionaries)’ (also quoted in Hacker 2012), the tenth edi- 
tion of the Pocket Oxford Dictionary of Current English (2005), the ninth edition of 
the Little Oxford English Dictionary (2006) and the twelfth edition of the Oxford 
Paperback Dictionary (2012) all use (the same) respelling system without diacrit- 
ics, marking stress by a bold typeface. Was R. E. Allen, editor of the Concise Oxford 
Dictionary, eighth edition, presuming too much about the future of these smaller dic- 
tionaries, or has the practice for them changed again? Many words are not re-spelled, 
leading me to assume that the dictionaries are intended for native speakers, but the 
respelling used allows for both rhotic and non-rhotic accents (calm would be tran- 
scribed as ‘kahm, but harm as ‘harm’). Speakers of North West English, which has 
no separate /n/ phoneme-—all occurrences of <ng> are pronounced /ng/ in these 
accents—are accommodated by the non-distinction of finger and singer (finger = ‘fin- 
ger, and singer is assumed under ‘sing’). But this does not help other speakers who are 
given no help with banger or anger. 

Some early writers—Kenrick (A New Dictionary of the English Language 1773), 
Sheridan (A General Dictionary of the English Language 1780), and Walker (A criti- 
cal pronouncing dictionary, and expositor of the English language 1791)—were more 
innovative, but their approach did not last, and by 1836 Walker's dictionary was being 
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re-issued with a more traditional representation of pronunciation using diacritics. All 
these writers gave numbers to each of the vowels, and instead of respelling the words, 
they simply placed the vowel’s number above it. Kenrick recognizes sixteen vowels, 
and numbers them 1-16. He also uses 0 to mark unpronounced vowel letters and some 
instances of schwa, such as the <e> of castle or garden, and the second <e> of every: 
CASTLE’, GAMRDEON, ESVE°RY®. Sheridan and Walker start their count again 
with each new vowel letter, so Sheridan has a3, e!-3, i/-3, 03, and ul?. Additionally he 
has y' 7, giving seventeen vowels, but he admits that these represent only nine sounds 
because there are ‘pairs’ that represent the same sound, such as a (hate) and e? (bear). 
Walker lists fifteen vowels: a!~4, e!”?, i, o!4 and ul3, of which only a? (as in fall) and 0° 
(as in nor) overlap. 

This was an ingenious method, especially in the version devised by Kenrick, as it obvi- 
ated the need to respell anything—whatever the spelling, the vowel number explained it; 
but readers must have found it too difficult, and so it was dropped. 


29.2.1.3 Problems with consonants 


Most of the consonant letters and groups used in English cause little problem: <b, h, 
j, k, 1, m, p, v, w, sh> always represent the same ‘sound when they occur alone, except 
when they are silent (as in debt, heir, knock, half, receipt, wry). However, there are more 
consonant phonemes than letters to represent them, so some solution has to be found to 
augment the number of characters, or character groups. <c> can always be replaced by 
either ‘k or ‘s; and even without replacing it, the rules for its interpretation are simple, 
with very few exceptions: before <ae> it is /s/, for example caesura, and in the excep- 
tional words sceptic and celtic, it is /k/. Otherwise <c> represents /s/ before <e> or <i> 
(cent, city), /{/ before unstressed <iV> (optician, coercion), and /k/ elsewhere. <ch>, usu- 
ally /tf/, may be /k/ or /f/ even in the same orthographic context: machismo, machina- 
tion, machine respectively. There is no traditional spelling for the sound /3/ of measure, 
but by analogy with /f/, spelt <sh>, ‘zh’ has been used for many years—which shows that 
even the early orthoépists knew the relationship between these two sounds. The voiced 
and voiceless <th>, however, were troublesome at first: the difference between them was 
not properly recognized, and some editors disregarded it. Where it was acknowledged 
as a problem, some chose to write one of them in italics, or in capitals, and eventually 
when it became clear that the difference between them was the same as that between /f/ 
and /3/, ‘dh’ was used for the voiced fricative. 

The Collins Dictionary and Thesaurus of the English Language (fifth edition, 2011) 
also respells words: ‘Pronunciations are given in brackets for words that are difficult 
or confusing; the word is re-spelt as it is pronounced, with the stressed syllable in bold 
type, e.g. chaperone (shap-per-rone)’ (2011: vii). This seems an odd example to use, as 
it implies that both the <p> and the <r> are pronounced twice: in both syllable initial 
and syllable final position. However, the quality of the vowels cannot be shown unam- 
biguously if one or other of these consonants is omitted. On the other hand, the Collins 
English Dictionary, first published in 1979, and available online from 2012, uses an IPA 
transcription. Chaperone is transcribed as /‘feepa raun/. 
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Cassell’s English Dictionary (2000) is another recent dictionary to retain respelling: ‘to 
provide a compromise between accuracy and understanding by the majority of users. 
As few specialized phonetic symbols and additional accents or marks on letters are 
used as will fulfil this aim’ (2000: xi). The pronunciation is that of the “ordinary edu- 
cated English speaker’, which some users will recognize under the label of “Received 
Pronunciation” ’ (2000: xi). It keeps strictly to a non-rhotic model: schwa is used to rep- 
resent both the neutral vowel in isolation (e.g. above, ‘obiiv’”’), and the reflex of ortho- 


graphic <r> in diphthongs: pure: ‘pis’ 


29.2.2 Phonetic Transcriptions 


29.2.2.1 Thomas Spence 


Thomas Spence’s The Grand Repository of the English Language (1775) was published 
in Newcastle-upon-Tyne, and so probably not generally available elsewhere. Only two 
copies are now known to exist—one in Newcastle-upon-Tyne, the other in Boston, 
Massachusetts (Beal 2009). 

There had already been many attempts by English phoneticians and orthoépists to 
devise a new spelling system (see Abercrombie 1948), but none had yet been used in a 
dictionary. Spence was a political agitator and is attributed with being the first to use 
the phrase ‘the rights of man’ —predating Thomas Paine. He obviously felt that it was 
important for ‘correct’ pronunciation to be taught in order to permit people to advance 
in their social life—perhaps he was aware that his own provincial speech could have 
debarred him from advancement, so he was probably very sensitive on this point, and 
not so thick-skinned as Johnson, who could brush aside any criticism. 

In his dictionary, Spence indicated the pronunciations by means of his own new 
alphabet, which he also used instead of traditional orthography in some of his political 
writings. Each of the headwords is first given in traditional orthography with an ini- 
tial capital, and the stress is indicated in polysyllabic words by placing an acute accent 
over the vowel of the stressed syllable (or the first vowel of a digraph), or following the 
stressed vowel when this is the initial letter of the word, and so a capital. For instance, the 
first page of E includes these words: 


Effort, Eighty, Ejaculate. 


This is followed by his phonetic version, in capitals. His new alphabet consists of 
forty symbols: fifteen vowels and diphthongs, and twenty-five consonants. From 
our modern inventory, equivalents for the following are missing: /9:, 3:, 9/ and the 
centring diphthongs, which are represented as V+/r/. On the other hand, the sym- 
bol ‘w represents /ju:/. All the currently recognized consonants are included. He 
betrays his northernness by not distinguishing /u/ from /a/, and also includes a 
symbol for /m/. 
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29.2.2.2. The Oxford English Dictionary 


When the Philological Society commissioned a new dictionary—the New English 
Dictionary on Historical Principles (better known today as the Oxford English Dictionary), 
its early editor, James Murray, wrestled with the problem of how to represent pronuncia- 
tion. It is unnecessary for me to describe in great detail either his problems or his solu- 
tions as this has been comprehensively documented by MacMahon (1985). 

The advances made in phonetic analysis during the course of the nineteenth cen- 
tury meant that transcription systems were being devised to distinguish very small 
differences in the articulatory or auditory nature of sounds, and in particular vowels. 
Alexander Ellis, Isaac Pitman, Melville Bell, Henry Sweet, and others all produced pho- 
netic alphabets, and Murray was in contact with them all. As a consequence, the pho- 
nological principle behind the eighteenth- and earlier nineteenth-century orthoépists’ 
solutions seems to have been lost, and Murray may have been trying to do too much in 
his work for the OED. Unfortunately, as MacMahon makes clear, Murray nowhere states 
in his Introduction to the Dictionary what accent or group of accents he is describ- 
ing, and the only comments about it directly attributable to Murray were found by 
MacMahon in a letter to Thomas Hallam: he was indicating ‘the general outline of recog- 
nised pronunciation and ‘Pronunciation in the broad sense’ (MacMahon 1983: 78). His 
vowel symbols alone amount to seventy-two, of which nine are for ‘foreign’ sounds, and 
one—the apostrophe, what Murray calls the ‘voice glide’— indicating that the following 
consonant is syllabic. As an example of the range of vowel symbols used in unstressed 
syllables, the three words carat, caret, and carrot are transcribed as [ke-rat], [kee-ret], 
and {ke-rat] respectively. He also introduces a separate symbol for the vowel of pass: [a], 
which is to be interpreted as /ee/ or /a:/ according to the accent of the speaker (calm is 
[kam], while hat is [heet]). MacMahon’s conclusion is that Murray was trying to evolve 
‘a particular type of diaphonetic notation for educated varieties of English (MacMahon 
1985: 83). On the other hand, the only ‘superfluous’ consonant symbol is his distinction 
between pre-vocalic and post-vocalic /r/ ([r] and [1]}). Was this his way of allowing for 
both rhotic and non-rhotic accents? No other allophonic distinctions are made in con- 
sonant symbols, despite the very clear range of aspirated and unaspirated voiceless stops 
that exist in English, and the clear and dark /I/ allophones, as well as other allophonic 
variants. Murray’s notation was continued by Robert Burchfield in his four-volume 
Supplement to the OED, but for the second edition, in 1989, it was replaced by an IPA 
transcription, giving alternative pronunciations, and also American pronunciations. 


29.2.2.3 Wyld’s Universal Dictionary 


In 1932, Henry Wyld, Merton Professor of English Language and Literature at Oxford, 
and Fellow of Merton College, published his Universal Dictionary of the English 
Language. This used both respelling and phonetic notation: ‘the pronunciation of each 
word is recorded both in a simple and phonetic notation, and in a less exact and popu- 
lar mode of spelling which the Publishers believe to be more generally intelligible. The 
pronunciations indicated are those current in good society, and where more than one 
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pronunciation of the same word are current, both are given, that preferred by the Editor 
being put first? Neither of the two systems follows the principles of the IPA exactly. 
In the body of the work, the first representation given for each word is the respelling, 


Grd «=> Cod Cod <n > fn 


which uses the now familiar ‘a, ‘, ‘i, 6, and ‘W for /el, i:, al, 9u/ and /ju:/, and “ar, ér, 
‘ir; 6r’ (but also ‘aw’ for orthographic <aw>), and ‘ar’ for /€9, 19, ala, 91/ and /jua/, but 
dispenses with the breve, presumably as being redundant. /a:, 3:, ui, 01, au/ are repre- 
sented respectively by ‘ah’ (but by ‘a’ alone before <rC>), ér; ‘00; ‘oi’ and ‘ow. Among 
the consonants, ‘th, ‘dh, and ‘zh’ represent /0, 3, 3/, but oddly, ‘gh’ is used for /x/. The 
‘simple phonetic notation’ also uses the macron, but this time as the equivalent of the 
IPA [:], and also italic font for certain vowels and diphthongs: ‘a’ (father), ‘x’ (hat), ‘a 
(but), ‘€ (get), ‘i (pin), T (seat), ‘a’ (hot), ‘3 (awe, torn), ‘A’ (bird), ‘2 (only in unstressed 
or unaccented syllables’), ‘ti’ (hoot), ‘uw’ (put), ‘aw (house), ‘ow’ (stone), ‘ai’ (white), ‘or’ 
(toy), ‘ea’ (air), ‘e? (made), ‘ia’ (hear), ‘ia’ (field), ‘ua’ (‘one pronunciation of sure, poor’). 
The consonants are mainly as in IPA, apart from ‘p’ for [9], and ‘? for [3]. Wyld uses ‘3’ 
for [y] in German words (e.g. Tag), and ‘y’ for [x] in loch. Stress is shown in both sys- 
tems by an acute accent placed above the vowel. From 1930, Wyld was a member of the 
BBC’s Advisory Committee on Spoken English, and it may be asa result of its influence 
that he chose to show pronunciations in both ways: the Broadcast English booklets pub- 
lished by the Committee also gave pronunciations in both a respelling and a phonetic 
notation (James 1928-37). 

The accent that Wyld is describing is one he calls ‘Received Standard’—‘that type of 
English which is spoken by those who have been educated at one of the older Public 
Schools’ (Universal Dictionary of the English Language: xix), i.e. the RP of his day. 
This is a non-rhotic accent, and his two transcription systems could therefore be 
confusing—-the respelling retains all orthographic <r>, while the phonetic representa- 
tion has no pre-consonantal /r/. This leads to the words farther and father being given 
different respellings: ‘fardher, ‘fahdher’ but identical phonetic transcriptions: [fada]. 
There is no explanation in the introduction that ‘ar’ represents /a:/ before a consonant 
but /zer/ before a vowel (farrow is shown as ‘fard’), and Wyld’s use of ‘a’ to mean different 
vowels in the two systems (/ze/ in the respelling and /a/ in the transcription) might easily 
confuse the unwary. The respelling system has no symbol for schwa, and most occur- 
rences of schwa are not indicated at all (the orthographic short vowel is retained), but on 
the other hand final <-on>, in action or carbon, for instance, is respelled as ‘u’: ‘&kshun, 
‘karbun. 


29.2.2.4 IPA Transcriptions 


The IPA is not monolithic: to coin a phrase, there’s more than one way to skin a cat. All 
the current dictionaries of the Oxford stable that use an IPA transcription (the Pocket 
and Little Oxford Dictionaries still use respelling) diverge from the usage in most oth- 
ers, as they have adopted the symbols chosen by Clive Upton, William A. Kretzschmar 
Jr., and Rafal Konopka for their Oxford Dictionary of Pronunciation for Current English 
(2001). Most editors retain the symbols /z, e, a1, ¢o/ for the vowels in hat, bet, my, hair, 
but following Upton et al., Oxford now uses /a, ¢, Al, €:/ instead, a change which has not 
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found favour in all quarters. It is generally admitted, for instance, that the /eo/ diphthong 
is increasingly being heard as the monophthong /e:/, but not that it is so widespread that 
the symbol for the diphthong should be dropped completely. The change to /a1/ has been 
particularly criticized by other linguists as unnecessary. However, these symbols do not 
contravene the principles of the IPA. All dictionaries using IPA retain the principle of 
representing the phonology rather than the raw phonetics: only distinct phonemes are 
given their own symbols, but in recent years two additional symbols have been intro- 
duced that break this convention: [i] and [uJ]. It is unclear in many cases whether the 
sounds spelled with the letters <i> or <y>, and <u> in unstressed syllables, particularly 
those ending in <y>, belong to the phonemes /i:/ and /u:/, or /1/ and /u/. The use of [i] 
and [u] allows either interpretation to be made. 


29.2.2.5 American Dictionaries 


The United States has never taken to the IPA in the way that other countries have, and al] 
the most popular and comprehensive American dictionaries have always used a respell- 
ing system, with diacritics, to show pronunciations. 

Another difference between American dictionaries and British ones has been the 
range of pronunciations shown. While most British dictionaries, at least until the end of 
the nineteenth century, showed only one pronunciation—the one considered by the edi- 
tor to be the ‘correct’ one in use amongst ‘educated’ speakers of the language, American 
editors admitted from very early on that it was not appropriate for them to restrict 
themselves in this way. As mentioned above, Worcester, writing in 1847, compared the 
pronunciations given by previous editors, and admitted that the ‘best’ speakers varied 
amongst themselves. The editors of the current American dictionaries al] go to great 
pains to stress this point: 


While usage is stil] and must always be the standard, it is no longer the usage ofa par- 
ticular locality, since the pronunciation of no one locality can now claim admitted 
precedence. Nor can the pronunciation of any one person, or group of persons, be 
taken as a standard for all .... The standard of English pronunciation .. . is the usage 
that now prevails among the educated and cultured people to whom the Janguage 
is vernacular; but ...since somewhat different pronunciations are used by the cul- 
tivated in different regions too large to be ignored, we must admit the fact that uni- 
formity of pronunciation is not to be found throughout the English-speaking world, 
though there is a very large percentage of practical uniformity. (Merriam-Webster 
New International Dictionary, second edition, 1934: xxvi, and almost identically in 
the Webster's Third New International Dictionary, 1961: 38a-39a) 


There can be no objective standard for correct pronunciation other than the usage 
of thoughtful and, in particular, educated speakers of English. Among such speak- 
ers one hears much variation in pronunciation. (Merriam-Webster’s Collegiate 
Dictionary, eleventh edition, 2003: 33a.) 

Pronunciations are listed primarily according to known frequency of use. Unless 
otherwise labeled, they are consistently standard and may be used freely in all social 
circumstances. (Webster's College Dictionary, Random House 1991: xxix.) 
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There is... no single authority on pronunciation in this country, and no standard 
dialect. (Funk & Wagnalls Standard Dictionary of the English Language, International 
Edition, 1966: viii.) 


While several of these dictionaries make use of the IPA in their introductory notes on 
pronunciation, which are frequently long sections (fifteen large pages of three columns 
each in the case of Webster’ Third), none of them use it in the body of the work. Funk and 
Wagnalls’ more recent Standard College Dictionary has an article on Regional Variations 
in American Pronunciation (1977: xii-xiv) which uses IPA symbols throughout, and one 
wonders why, if readers of the dictionary are able to interpret IPA symbols sufficiently to 
understand this, they cannot be trusted to read the rest of the dictionary in a similar way. 

The Random House Webster’s College Dictionary (despite the word ‘Webster’ in 
the title, it has no connection with the Merriam-Webster dictionaries)? justifies its 
use of respelling on etymological grounds: ‘Such a symbol system reflects underlying 
sound~spelling relationships in English. Just as the stressed vowels of divine and divinity 
are spelled with an i, the pronunciation symbols for these two sounds (1) and (i) respec- 
tively, are also forms of i. By contrast, were we to use symbols from the IPA, which reflect 
sound-spelling correspondences in Latinate languages, the stressed vowels in divine 
and divinity would have to be rendered as [a1] and [1], and the underlying relationship 
between these would be obscured? (1991: xxix) 

Nevertheless, the most recent dictionaries all make use of the schwa to represent 
the neutral vowel. The Merriam-Webster New International (1934) had a range of sym- 
bols for /a/, depending on context; for instance \d\was used in sofa, abound, while \d\ 
occurred in account.® This latter is explained as ‘a variable sound between the limits of 
a in sofa and 4 in agitate.’ Webster’ Third (1961) dispenses with these niceties, and tran- 
scribes all such vowels with schwa. 

Interestingly, all the dictionaries 1 have examined mark stress in the same way: an 
acute accent following the stressed syllable, but they distinguish primary from second- 
ary stress: primary stress has a bold accent, while secondary stress has the accent in 
‘light’ (not bold) type. It is not always easy to distinguish these two symbols. 


29.3 CONCLUSION 


The last English dictionary with pretensions to authority which did not include a 
guide to pronunciation would appear to have been Johnson's Dictionary of the English 
Language in 1755. Since then, whether eighteenth-century orthoépists trying to 
‘improve’ the English of their dictionaries’ users, or twentieth- and twenty-first-century 


> Compare Landau (2001: 410-12). 
6 \,, .\is the symbol used to mark off the phonetic representation in this dictionary. 
7 ‘Pronunciation’ (Merriam-Webster New International 1934: xlii). 
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linguists describing the language, lexicographers have found ways of representing 
pronunciation. It is said that IPA transcription is now taking over from respelling sys- 
tems in British dictionaries, (e.g. Hacker 2012: 16: “The late-twentieth-century change 
from respelling to IPA in most monolingual English dictionaries’) but I hope that I 
have shown that many general dictionaries intended for a native-speaker audience still 
use methods of respelling, which their editors claim either is easier for the layman to 
understand than a phonetic transcription, or can be interpreted more easily in a range 
of accents and dialects, as alphabetic letters can be used for a greater range of sound 
qualities than the more precise IPA symbols. We have seen that some of the claims for 
respelling, for instance the range of accents covered, are less than accurate, as north- 
ern English, for instance, does not have the /a:/ vowel in many places where it occurs 
in southern English, and Scottish English generally has two diphthongs where other 
dialects have one, distinguishing between tide and tied. What I find remarkable is the 
longevity of some of the respelling conventions: even the most recent editions of some 
dictionaries are still using the macron above vowel letters to represent the sounds used 
for the ‘names’ of those letters (mate, mite, mote), a tradition that goes back to the very 
first so-called ‘pronouncing’ dictionary of 1757. 


CHAPTER 30 
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LABELLING AND 
METALANGUAGE 


CHARLOTTE BREWER 


30.1 INTRODUCTION 


FRoM early on, dictionary makers have realized that a definition alone is not enough to 
characterize a word. Their readers need more information than this if they are to under- 
stand the words that a dictionary defines, in order to appreciate the connotations that a 
word has in context, and to be able to use the word effectively themselves. So right from 
the start of the English monolingual dictionary tradition, we find an array of accompa- 
nying devices to help provide such information—symbols on the one hand, and labels, 
notes, or comments on the other. Cawdrey’s Table Alphabeticall of 1604, commonly 
regarded as the first English monolingual dictionary, set out to teach the ‘true... under- 
standing of hard, usuall Englissh wordes, borrowed from the Hebrew, Greeke, Latine, 
or French’—or so the title page declared—and a note at the start of the volume lists the 
symbols he used to indicate the language of origin of some of his headwords: ‘(g. or gr.) 
standeth for Greeke’; ‘The French words haue this ($) before them. Labels of this kind 
communicate two different types of information: first, that the terms they accompany do 
indeed have a specific characteristic (in this case, that they are either of Greek or French 
origin, according to their accompanying symbol), and secondly, that all unmarked 
terms belong to some sort of default category (in this case, words which are not of Greek 
or French origin). This is an important double characteristic of all the ways in which 
lexicographers seek to mark individual items of vocabulary: not only is any specific term 
identified as having a special characteristic—be it slang, offensive, archaic, literary, or 
whatever the feature is that is being picked out as important for the user to be aware 
of—but simultaneously all the other unmarked terms in the dictionary are identified 
as not having this feature. As Svensén (2009: 315) puts it, labelling ‘implies that a certain 
lexical item deviates in some respect from the main bulk of items described in a diction- 
ary and that its use is subject to some kind of restriction. Cawdrey found it difficult to 
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be consistent, accurate, and comprehensive in labelling his terms, and so have all his 
successors. This article discusses the use of labels and other sorts of metalanguage to 
characterize vocabulary in selected past dictionaries in order to illuminate their deploy- 
ment in dictionaries today. The proliferation and refinement of such information has 
correlated with changing societal norms on the one hand and with increased sensitivity 
towards dictionary users on the other, while the principles and practices involved in 
applying labels to vocabulary items in dictionaries continue to vary. 


30.2 USE OF LABELS BY EARLY 
LEXICOGRAPHERS OF ENGLISH 


Cawdrey’s labels were designed to further characterize terms which had already been 
selected for their special character—hard words that had recently entered the language 
owing to ‘inkhorn’ scholarship, namely their unfamiliarity owing to foreignness—but 
as dictionaries developed into more general works, the nature of the vocabulary they 
contained became far more various. Lexicographers rose to the accompanying chal- 
lenge by providing a wider range of labels, symbols, and metalinguistic comments in 
order to provide users with more nuanced information, above and beyond the defini- 
tion of a word, of one sort or another. Hard words themselves were early identified by 
dagger symbols (from Phillips’ New World of English Words, published 1658, onwards), 
while more various techniques—asterisks, abbreviations, or usage notes—were used 
(from Bullokar’s English Expositor, published 1616, onwards) for distinguishing ‘old’ 
words (Osselton 2006). One of the most pressing needs felt by both users and lexicog- 
raphers has been the provision of information about register and usage, so as to be able 
to warn users (supposing they did not already know) whether a term is polite, informal, 
offensive, outdated, may indicate vulgarity in its user, and so on. Johnson's Dictionary 
of 1755 is often regarded as the first to have delivered prescriptive judgements on usage, 
branding words as ‘low, ‘barbarous, ‘despicable’ ‘ludicrous, ‘mean; ‘vulgar’ etc. But in 
fact many of his predecessors had also applied such judgements by different means, for 
example labelling them ‘c’ for ‘cant; often (but not always) marking the same terms that 
Johnson later picked out, for example bamboozle (‘a cant word not used in pure or in 
grave writings’), to brush (‘To move with haste’: ‘a ludicrous word’), glee (‘joy; merre- 
ment; gayety’: ‘not now used, except in ludicrous writing’), prentice (‘contracted, by col- 
loquial licence, from apprentice’), snip (‘A share’: ‘a low word’), tiny (‘A burlesque word), 
and woundy (‘Excessive’: ‘a low bad word’) (examples quoted from Osselton 2006: 102). 
Of course, the question as to whether Johnson, and his forebears, were really being 
prescriptive in characterizing words thus is not easy to determine (see Mugglestone, 
this volume). As has been observed by many lexicographical commentators, ‘labelling 
is the least scientific and most artistic part of the lexicographer’s task. ... Status labels 
arise from a judgement about usage rather than a judgement about meaning, and hence 
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emerge from the lexicographer’s sense of the language’ (here in the words of Richard 
Bailey, reported in Pijnenburg and de Tollenaere 1980: 310). Was Johnson recording 
norms of usage which were widely recognized (i.e. being descriptive), or was he deliv- 
ering judgements that might be disputed by others of similar class and education (i.e. 
being prescriptive)? At this distance it is often difficult to say.’ Nevertheless, once norms 
are codified in word books in this way, even if the writers originally intended them to be 
descriptive, they swiftly become prescriptive. Users will treat even avowedly descriptive 
works as if they set out to prescribe what should be rather than record what is. 


30.3 THE PRACTICE OF THE OED 


SPPTPETETTSPTeSTTSISTIT irri rerrrrerrrririererrreeritit eer err rier trerr eer eireeerirrererere errr erie re ieee retertee y 


Impartial and accurate observation of a word's specific characteristics is, however, what 
most general modern English dictionaries aim at, and in this respect the Oxford English 
Dictionary (OED), first published in instalments between 1884 and 1928, played a cru- 
cial role as the forerunner of descriptive lexicography based on (what might loosely be 
described as) corpus evidence: around 5 million quotations from printed texts drawn 
from all periods of the language from 1150 onwards. The OED lexicographers subjected 
these to intensive scrutiny to determine the meaning of words, the development of dif- 
ferent senses, and the connotations of usage in context. The latter were variously dis- 
cussed by the chief editor J. A. H. Murray in his ‘General Explanations, printed in the 
first volume of the OED, where he explained that he used a range of terms to characterize 
a word where appropriate. Least problematic of these would seem to be labelling words 
by subject (Mus. (in Music), Bot. (in Botany), etc.), or by variety of English, when the 
word ‘is not current in the standard English of Great Britain, as U.S., N. Amer. Austral., 
etc? However, Murray articulated these ideals in 1884 and the dictionary was not to be 
completed for another forty-four years. Inevitably, over so long a period of compilation, 
with editors and editorial assistants coming and going, consistency in labelling practices 
(as in everything else), however desirable, was never a practicable possibility, and the 
OED from early on ran into problems with the issue already identified, that not labelling 
a word categorizes it, by default, as a member of the unmarked group of words which 
constitute the dictionary as a whole. Even within the same stretch of the alphabet uni- 
formity in labelling proved impossible. Thus clavicembalo, a name for a harpsichord, 
was labelled ‘Music. but clavichord and clavicithern, immediately following, were not. 

It was similarly difficult for the OED lexicographers to be consistent in their applica- 
tion of regional labels and in identifying and characterizing words of foreign origin that 
had entered the language at various stages in the past. These Murray divided into denizens 
(‘words fully naturalized as to use, but not as to form, pronunciation, or inflection’), aliens 


1 Cf. Considine (2012: 97): ‘when Johnson calls the noun squelch a “low ludicrous word” he does 
not mean to identify it prescriptively as ridiculous, but to signal its restriction to burlesque or playful 
contexts’ See further Barnbrook (2005) and McDermott (2005). 
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(‘names of foreign objects, titles, etc, for which there were no native equivalents), and 
casuals. But he had to acknowledge that “There are no fixed limits between these classes’ 
and the symbol he chose to designate a degree of foreignness—the so-called tramlines 
mark, that is, two parallel lines—was applied with manifest inconsistency both in the first 
edition of the OED and in the four-volume Supplement of 1972-86 (and consequently in 
the second edition of 1989 as well, since this was formed by typographically merging OED 
with the Supplement).” Thus the Hindi words bhoodan, dhal, dhobi, gramdan, and many 
others are ‘tramlined; but dhoon, dhoona, and dhurrie, and many others are not; nei- 
ther the quotation evidence for these words nor their other characteristics fully explain 
why they should be thus differentiated. The new, third edition of OED (2000-), which is 
revising the dictionary for the first time in its entirety, has decided to drop this symbol 
altogether, no doubt to avoid inconsistencies of this sort but also, perhaps, because it is 
decisively repudiating the Anglo-centric implications of this policy in previous editions. 

The first edition of OED used a number of other symbols to label words, with equally 
uneven results. As an historical dictionary aiming to record the whole vocabulary of 
English up to the present day (i.e. the late 1900s), it needed to mark words whose usage 
had died out at some stage in the history of the language, and also—given the impor- 
tance it attached to literary sources which were often linguistically deviant—to indicate 
eccentricity or rareness. For obsolete words Murray borrowed the dagger favoured by 
many of his predecessors, noting that ‘in the case of rare words... it is often difficult to 
say whether they are or are not obsolete: The symbol ‘-1, or -o’ indicated ‘that only one, 
or no [!] actual instance of the word is known to us, while the label rare was used for rare 
words and the label nonce-wd to identify ‘words apparently employed only for the nonce’ 
Once again, the application of these labels was inconsistent, and the user is left puzzled 
as to what to infer from the absence of these terms as much as from their presence (astu- 
city, quoted only from Carlyle, is left unmarked, as are cheepy, crack-tryst, hirundine (‘of 
or pertaining to a swallow’) and numerous other words taken from his works, while oth- 
ers, also supported by only one quotation, are labelled nonce-word, rare, etc.). 

Murray's ‘General Explanations’ also instanced arch. (archaic or obsolescent), collog. 
(colloquial), dial. (now dialectal ...), as examples of ‘status’ labels. Very surprisingly 
to modern linguists, who recognize OED as the pinnacle of descriptive lexicography, 
he used another symbol (4) to identify what he described as ‘catachrestic and errone- 
ous uses, confusions, and the like: This is evidently an odd feature to find in a descrip- 
tive dictionary, and it is clear that in a small number of instances both Murray and his 
fellow-OED lexicographers sought to impose their own views on the impropriety or 
undesirability of certain usages, even when these were amply attested by their quotation 


? For more on tramlines see Ogilvie (2008) and Micklethwait (2009). Ogilvie’s view that the absence 
of tramlines from the 1933 OED Supplement represented a linguistically and morally enlightened change 
in editorial attitude towards non-English cultures should be treated with caution, given that the editor 
she identifies as responsible, C. T. Onions, showed little sign of such anachronistic broadmindedness in 
other respects (cf. his definition of white man as ‘a man of honourable character such as one associates 
with a European (as distinguished from a negro)’). 
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evidence. Notably, Murray deplored dropping the p in pronouncing words of Greek origin 
beginning ps, pn, etc., describing this as ‘an unscholarly practice often leading to ambigu- 
ity or to a disguising of the composition of the word: In entries for these words, he marked 
initial p as ‘an optional pronunciation which is recommended especially in all words that 
retain their Greek form (e.g. psora, psyche), andin scientific terms generally, which have not 
been irretrievably mutilated by popular use’ (see further Brewer 2007)). 

To today’s descriptive linguists this looks like doublethink. How could the original 
OED, conceived as a comprehensive and impartial inventory of words used through 
time, refer to the process of diachronic change as irretrievable mutilation? Such apparent 
confusion and inconsistency—even in so great a lexicographer as James Murray—has 
however characterized the application of usage labels and metalinguistic comments over 
the course of the twentieth century in both the OED itself and in many (perhaps all) other 
English dictionaries. Today's OED has dropped the paragraph symbol along with the 
tramlines one, substituting usage notes if it deems this appropriate, and hence has fallen 
in more consistently than preceding editions with the general twentieth-century trend 
towards lexicographical descriptivism. (In this respect, in fact, the historical component 
of the OED gives it a special advantage, since its quotations of actual usage over time can 
often contextualize views about what is or is not standard English: for example, it reveals 
that the earliest recorded sense of disinterested is ‘not interested; notwithstanding that 
this sense is often marked in contemporary dictionaries as one objected to by traditional- 
ists as a corruption of the supposedly original sense ‘unbiased by personal interest’). 

Murray’s remarks occur in a note at the head of the entry for ps-, and many other 
ad hoc, hence difficult to categorize, usage comments are similarly to be found in indi- 
vidual entries scattered throughout the dictionary’s half a million entries. OED’s prac- 
tice was to use a combination of ways to label its words: notes of varying sorts on the 
one hand, and more standardized labels (sometimes before, after, or as part of the defi- 
nition) on the other. As already described, Murray gave examples of some of his labels 
in his ‘General Explanations, and more can be found in OED’s list of abbreviations (e.g. 
euphem[istically]}, obs[olete]; also fig(urative, -ly} and transf., for ‘transferred sense’). But 
many labels, not being abbreviations, were not included in this list, for example ‘abu- 
sive, ‘coarse, ‘contemptuous, ‘emotional feminine} ‘derogatory, ‘euphemistic; ‘familiar, 
‘foul; ‘jocular’ ‘literary; ‘offensive; ‘poet|ic];, ‘rhetorical, ‘slang, ‘vulgar. It is self-evident 
that terms like these, many of which point to the attitudes held by speakers and/or writ- 
ers of the vocabulary in question, perfectly illustrate Bailey’s dictum that labels reflect 
‘a judgement about usage rather than a judgement about meaning: As used in the first 
edition of the OED, they reveal views on race, gender, sexuality, politeness, or literary 
values which shed light (in fascinating and sometimes unexpected ways) on the culture 
of the late nineteenth and early twentieth centuries and on changes since then, thus valu- 
ably enhancing this dictionary’s historical function.’ And like Johnson's labels, they also 


> In this respect Oxford University Press’s decision to withdraw the electronically searchable version 
of the second edition of OED from the OED Online website, through which one used to be able to access 
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raise the question whether and to what extent ‘the lexicographer’s sense of the language’ 
in OED is being applied prescriptively or descriptively. All this means that the labels and 
usage notes in the first edition of the OED area fine tool to identify and discriminate cul- 
turally significant vocabulary on the one hand, and/or lexicographical attitudes towards 
such vocabulary on the other, especially in view of OED’s seminal status in English lexi- 
cography andits role as a cultural icon. Clearly, however, given the range of methods used 
to signal judgement, Murray's ‘General Explanations’ of 1884 insufficiently described 
labelling policy as it developed over the next forty-four years, and no version of OED to 
this day has published either a comprehensive list of labels or an account of the principles 
underlying their application. But it is certainly not alone in this: nearly 140 years on, it 
seems that no dictionary has given an entirely satisfactory description or explanation of 
its policy and practice in this respect. With other dictionaries as with the OED, however, 
labels point to central aspects of the lexicographer’s assumptions about his or her work.4 


30.4 EXPLORING THE IMPLICATIONS OF 
LABELLING PRACTICE SINCE THE FIRST 
EDITION OF THE OED 


Had Murray and his fellow OED lexicographers supplied a document of this sort, they 
might have told us more explicitly who they were writing their dictionary for. For there is 
no doubt that the OED, in common with other nineteenth- and early twentieth-century 
English language dictionaries, assumed a relatively homogeneous readership—white, 
middle-class, educated, with culturally and geographically distinct experiences. These 
expectations can sometimes be seen in the non-linguistic labels applied to the referent 
ofa term, as well as in the linguistic labels applied to a term itself. OED’s tag ‘well known’ 
for example, to mean ‘likely to be recognized by the typical dictionary user} points to 
the unarticulated expectation that this person is or has been resident in Britain and is a 
member of the educated middle-class: so anchor, to mean ‘anchoress; is ‘well known in 
the book title Ancren Riwle’; champagne is ‘a well-known wine}; a charley, that is, a small 
triangular beard, is ‘well-known in the portraits of Charles I and his contemporaries’; a 
cock-roach is ‘a well-known dark-brown beetle-like insect’; the Dunciad is ‘the name of a 
well-known poem by Pope’; thrift is a ‘well known sea-shore and alpine plant, and so on.° 


the first edition and thus search its labels, other definitional terms, and usage notes, is to be particularly 
regretted. 

* On labels in the first edition of the OED, the Supplement, and the nascent third edition see 
Burchfield (1973), Mugglestone (2000b), Brewer (20052, 20074: 174~212, 244~9). 

> Chardonnens (1997) counted 144 uses of this term altogether (all but a few of them reflecting value 
judgements), of which 34 occurred in botanical and 26 in zoological entries respectively and indicated an 
unmistakably Anglo-centric orientation. 
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Over the course of the twentieth century, however, with rising levels of literacy, edu- 
cation, and general exposure to print, both in UK and US populations and in English- 
speaking countries further afield, the readership for English language dictionaries has 
become much more diverse, while dictionaries themselves now contain a much more 
varied range of vocabulary (more slang and colloquial language, more technical and sci- 
entific vocabulary, and much more World English). Correspondingly, the need for labels 
and usage notes has increased, although different publishing houses have responded 
to this need in different ways and at varying stages. Labels (and associated metalan- 
guage)—their presence, absence, and character, particularly as applied to contentious 
or potentially contentious vocabulary—can function as a litmus test for the dictionary 
as a whole. Certainly they tell us what lexicographers, consciously or unconsciously, 
think their users want or need from such a work. The notorious instance of this has been 
the public rumpus greeting publication of Webster’s Third New International Dictionary 
(1961), whose editor Philip Gove actually reduced the number of usage labels indicat- 
ing the register of a word which had been supplied by the previous edition of Webster's 
International (the second edition of 1934), eliminating the label colloquial altogether and 
applying the label ‘slang’ much more selectively than in previous editions. Instead, Gove 
expected his readers to divine the connotations of words from usage notes indicating 
the status of words that were ‘on the borderline of standard English’ and from illustrative 
quotations. The absence of condemnatory labels for usages such as ‘who’ as the object 
of a verb or a following preposition, or for ‘ain't’ infuriated newspaper reviewers up and 
down the country who assumed that such items were being categorized by the Third as 
part of the standard language it documented by default, notwithstanding Gove’s care- 
ful use of usage notes—on the use of who as an object he had written ‘used by speakers 
on all educational levels and by many reputable writers, though disapproved by some 
grammarians, and of ain't , ‘though disapproved by many and more common in less 
educated speech, used orally in most parts of the U.S. by many cultivated speakers. (who 
as object had been given a more cautious usage note in the second edition of 1934 and 
had not been noticed in the 1924 edition, while ain't had been labelled ‘Dial. or Hit. in 
1934 and ‘collog. or illiterate’ in that of 1924). Instead, readers feared that the descriptiv- 
ism inherent in this dictionary project would have a prescriptive effect, so that Gove 
was sanctioning and thus virtually recommending such usage (see further Morton 1994: 
135~41, 153-214). 

Most other dictionaries did not follow suit. Initially in the USA, but increasingly 
in the UK as well, no doubt in response to the outcry with which Webster's Third was 
received, lexicographers have improved the sensitivity and range of labels and have 
experimented with many different ways of providing contextual information about the 
words they define, particularly on their register and social acceptability. This has coin- 
cided with dictionaries’ increasingly self-conscious orientation towards users’ needs 
(particularly those of language learners), as identified by Fontenelle in this volume. It 
has often resulted in the shifting of information from labels, which can be overlooked 
by users, to much more explicit (and visible) usage notes or other forms of metalinguis- 
tic language, in the attempt to satisfy the linguistic requirement for descriptivism while 
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warning users that some terms, although widely used, are regarded by some members of 
the language community as contentious in some way. 

The proliferation of methods and symbols has come at a cost, helpfully documented 
by Norri in two articles (Norri 1996 and 2000) which examined the labelling policies of 
ten British and American dictionaries (published in the last few years of the twentieth 
century) to indicate regional and derogatory usages respectively. Where geo-political 
categories were concerned, Norri found that the boundaries between Scotland and 
England, Ireland and Scotland, Scotland and New Zealand, and Australia and New 
Zealand repeatedly presented problems. kirk and burn, for example, were sometimes 
labelled as Scottish, sometimes as Northern English words; banshee as sometimes 
Scottish, sometimes Irish; ashet restricted to Scotland in some dictionaries and Scotland 
and New Zealand in others; and Jarrikin similarly inconsistently restricted to Australia 
or to Australia and New Zealand. Random checks of these words in more recent dic- 
tionaries confirm that such variations continue to this day. Discrepancies were not lim- 
ited to these territories, however: Norri gives ample examples of inconsistencies both 
among and between British and American dictionaries in applying the labels British and 
American respectively. After thorough analysis, he concludes 


The picture that emerges from a study of regional labelling in American and British 
dictionaries is essentially one of disharmony. It is simply not possible to discern 
any common policy or system. Almost all of the hundred words examined for the 
present article are regionally labelled in some of the dictionaries, unlabelled in oth- 
ers. Moreover, words that clearly have similar status are often treated quite differ- 
ently in one and the same volume. ‘The system for labelling, such as it is, is usually 
explained rather perfunctorily in the preface, most often by merely providing a list 
of the regional abbreviations used. There is considerable room for improvement and 
elucidation in all ten dictionaries (Norri 1996: 26).° 


Derogatory, or potentially derogatory, usage was, Norri found, equally inconsistently 
identified and marked in his representative dictionaries. But the reasons here are more 
complicated, often relating to temporal shifts in cultural sensitivities which dictionar- 
ies have caught up with at different rates. Terms of racial distinction, in particular, eas- 
ily slide into pejorative or offensive usage, so that denotation correspondingly shades 
into connotation, but societal and/or lexicographical awareness of this has varied 
enormously. For example, the first edition of OED had defined words like blacky and 
darky (both labelled ‘collog[uial]’) with no indication that they were offensive. Both 
terms were left untouched in the OED Supplement (1972-86) but were picked up in the 
second edition of the OED (1989) and labelled, respectively, ‘Now Hist. or derogatory’ 
and ‘usu. considered patronizing or mildly offensive. By contrast, other contempo- 
rary British dictionaries, for example the Longman Dictionary of the English Language 


® Norri’s findings on inconsistency in regional labelling are confirmed in the independent study in 
Osselton (1995: 34-45). 
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(1984) and even the usually more sparing Chambers Twentieth Century Dictionary 
(1983), were clear that darky was more objectionable than the 1989 OED recognized, 
labelling the word ‘derog.’ and ‘offensive’ respectively (since blacky was obsolete by 
the 1980s it was not included in either the 1984 Longman Dictionary of the English 
Language or the 1983 Chambers Twentieth Century Dictionary). The OED’s sensitiv- 
ity to the potential for giving offence has increased since 1989, however, and the cur- 
rent version of the third edition (June 2012) has recently changed the labels again, to 
read, simply, ‘offensive’ (or in the case of blacky, ‘Now offensive’). Similarly, chinaman, 
labelled derog. in both the 1984 Longman Dictionary of the English Language and the 
1983 Chambers Twentieth Century Dictionary, continued to be defined in OED, from 
first publication right up to 2010, as ‘a native of China, without warning labels of any 
sort, despite the fact that across the Atlantic Philip Gove had been forced to make a 
correction to later printings of Webster’ Third after his original entry for chinaman 
had been published similarly denuded in 1961 (Morton 1994: 239).” 

Norri’s article gives ample evidence of similar inconsistencies in identifying and 
labelling for other examples of derogatory vocabulary (applied to national, racial, or cul- 
tural groups, to sexual orientation, to men, to women, to deceitful people, and to those 
considered to be lacking in intelligence), A more cursory examination of dictionaries 
current in 2012 suggests that sensitivities have greatly increased where racial and ethnic 
issues are concerned, so that lexicographers almost uniformly identify and mark offen- 
sive or potentially offensive vocabulary of this kind, but that they are less sensitive and 
less consistent in identifying sexist vocabulary or vocabulary on which there is disagree- 
ment on so-called ‘correct’ usage (i.e. where usage is in the process of changing, and 
some influential speakers and writers have resisted change). 

The persistence of such variation in lexicographical judgement is surprising. All dic- 
tionaries of contemporary English today (unlike their predecessors of the 1980s and 
before, other than the innovative Collins Cobuild of 1987) are heavily dependent on care- 
fully assembled language corpora, which provide exactly the sort of systematic infor- 
mation about words and their usage to correct (one might have thought) the subjective 
character of labelling identified by Bailey in 1980. Such corpora enable lexicographers to 
scrutinize collocational and other sorts of contextual characteristics of words (e.g. their 
frequency) so that today’s dictionaries can go much further than their predecessors in 
explaining pragmatic aspects of meaning and in helping users understand a word’ con- 
textual appropriateness. One might therefore have expected that labelling practice and 
judgement would be much the same, these days, between one dictionary and another. 
But it remains the case that dictionaries offer their users different kinds of information, 
and their labels certainly show that their lexicographers hold different views about the 
use of words in context. 


7 The most recent version of OED has added the label ‘Now derogatory and offensive’ to this sense of 
chinaman but has not dated the revision, so users can infer only that it took place at some stage after the 
OED website was re-launched in December 2010. 
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As already indicated, it is in English language learners’ dictionaries, which are highly 
sensitized to user requirements, that one tends to find the fullest and most systema- 
tized treatment of labels and/or usage notes and other sorts of metalanguage. The 2009 
edition of the Longman Dictionary of Contemporary English (LDOCE), for example, 
intended for ‘advanced learners, utilizes a wide array of devices to mark the register and 
acceptability of words as well as many other features. Offensive vocabulary is often indi- 
cated three times over: by label (‘taboo informal’), in the Cobuild-style definition (e.g. 
the explanation of slut as ‘a very offensive word for a woman who has sex with a lot of 
people’), and by an additional direction to the user: ‘Do not use this word’ Additionally, 
LDOCE uses the labels {C] and [U] to indicate whether a noun is countable or uncount- 
able respectively and marks parts of speech, the register of a word where applicable (e.g. 
informal), whether a word is British or American English, and so on, employing a wealth 
of symbols and graphic devices which are explained with a diagram of annotated sample 
entries at the front of the volume. The advice is nuanced (something which it is easier 
to achieve in notes than labels): chick to mean ‘a young woman is reported to be a usage 
‘some people think is offensive; while girl to mean ‘woman is said to be ‘considered offen- 
sive by some women’ [my italics]. By contrast the 2011 Chambers Dictionary, which mar- 
kets itself as the ‘unrivalled dictionary for word lovers, provides much less information 
via labels and usage notes, perhaps because it regards its readers as already highly skilled 
in the areas of language familiarity and knowledge which such material elucidates. 

Wider investigation—of the Concise Oxford English Dictionary (COED12, 2011), the 
Collins Concise English Dictionary (2008), the Collins Cobuild Learner’s Dictionary, 
Concise Edition (2003), the Shorter Oxford English Dictionary (2007), the Oxford 
Dictionary of English (2005)—reveals that chick to mean ‘attractive young woman and 
girl to mean ‘woman are inconsistently identified as potentially offensive across a range 
of contemporary dictionaries, and usage notes and labels are sometimes present, some- 
times absent, to explain why and how certain terms offend traditionalists (e.g. refute to 
mean ‘deny’, decimate to mean ‘destroy a large part; disinterested to mean ‘bored’). 

But even when dictionaries appear to be aiming at similar groups of users, advice dif- 
fers bafflingly. The Collins Cobuild Learner’s Dictionary, Concise Edition of 2003 provides 
the same helpfully nuanced distinction as LDOCE does on the connotations of chick 
and girl—but the other way round; it says of chick, ‘some women find this use offensive; 
and of girl, ‘some people find this use offensive. And while many of today’s dictionaries 
label the term slut offensive or pejorative, reflecting what would appear to be a gener- 
ally shared sense of its connotations, the 2011 Concise Oxford English Dictionary sticks 
with what looks like an old-fashioned definition, ‘slovenly or promiscuous woman’— 
but leafing through previous editions, intended (one assumes) for precisely the same set 
of users, one finds that it has actually dropped the ‘pejorative label it introduced in the 
1990 edition. 

Variations and inconsistencies such as these are impossible to assess in any general 
way when encountered in hard copy. As dictionaries issue CDs to accompany printed 
versions, however, their labelling practices can be examined systematically, turning 
them into an easily accessible and analysable source of linguistic information about 
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different categories of vocabulary. The current version of OED Online is particularly 
generous, or vulnerable, in this respect, since it provides such a sophisticated battery of 
search tools, including a new dedicated feature allowing one to search across all entries 
for a limited range of labels picked out by the lexicographers themselves (currently 
‘allusive’, ‘archaic; ‘colloquial and slang’, ‘derogatory’, ‘disused’, ‘euphemistic, ‘historical, 
humorous, ‘ironic, ‘irregular’ ‘poetic and literary, ‘rare, ‘regional’). And while it is true, 
as already mentioned, that this dictionary has yet to provide a list of its labels, canny 
users can fish for likely terms (whether applied as labels proper or embedded in dis- 
cursive phrases, such as ‘often regarded as a loose use’) by searching in the definitional 
text and manually excluding irrelevant results. By one method or another, then, one can 
turn up ‘rare’ terms first used in a certain year or period, or vocabulary labelled ‘offen- 
sive, or ‘consciously literary’ and so on, in order to conduct research into the history 
and use of the language (though caution is required here, since electronic searches of 
OED access a mixed database, blending different versions of the dictionary in a propor- 
tion that changes every quarter as the revision progresses; currently the mix is roughly 
one-third revised third edition and two-thirds unrevised second edition). One can also 
use such searches as a tool to assess the aims and achievements of the dictionary itself, 
including to expose its inconsistencies, which—once one begins to investigate—appear 
to be legion: why should blow meaning ‘fellate’ be labelled ‘coarse slang, for example, 
while ‘blow-job’ is labelled only ‘slang’? Why are so few terms—only 434 listed in 328 
entries—labelled ‘euphemistic, and why should these include membrum virile and not 
yard, or go potty and not the succession of ameliorative words used to denote the room 
where this activity might take place (bathroom, lavatory, toilet, etc.)? Do such variations 
reflect a difference of usage, or inconsistency in lexicographical method? 


30.5 SOME POSSIBLE MODELS FOR A MORE 
CONSISTENT APPROACH 


Norri observed that dictionaries in their prefatory matter tend to deal perfunctorily 
with labels and their implications and the same remains true today, though some con- 
tain a paragraph or section on usage in which they tackle issues of prescriptiveness and 
descriptiveness (this is not to say that the lexicographers themselves are not are well 
aware of the pitfalls, as Norri again points out (2000: 72)). Instead, accounts and anal- 
yses of labelling have appeared in books and articles on lexicography, notably Norri’s 
own work, along with Landau (in successive editions of his book The Art and Craft of 
Lexicography, most recently 2001 (first edition, 1993), in which he distinguishes eight 
different categories of usage information supplied by dictionaries, in his chapter 5 on 
‘usage; 217-18) and Svensén 2009 (chapter 18 on “Marking, 314-32). A useful list and sur- 
vey of recent discussions of labelling in dictionaries appears in Ptaszyniski 2010, along 
with proposals for improvement. 
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A number of studies reproduce or refer to E J. Hausmann’s comprehensive tabular 
analysis, reproduced in Table 30.1 from Svensén (2009: 316), which develops and sys- 
tematizes the distinctions first formulated (or adumbrated) by Murray in 1884. 

Of course, transferring this sort of conceptual clarity into effective and meaningful 
lexicographical practice is not straightforward, as evidenced by precisely the sorts of 
variations between dictionaries that we have already examined. Svensén observes, ‘A 
labelling system transforms a continuum to a set of degrees on a scale (e.g, “colloquial”, 
“popular”, “vulgar”). Therefore, it is essential always to keep in mind that a label rep- 
resents an area that has a certain extension somewhere between centre and periphery. 
For instance, expressions belonging to the area “colloquial” can be colloquial to vary- 
ing degrees, that is, be located at varying distances from the unmarked area or from the 
one characterized as “popular” * In other words, lexicographical judgement must always 
intervene to draw the line. The problem for any individual dictionary, of course, is draw- 
ing the line in a way that all its potential users will understand and find helpful, while at 
the same time being consistent with (or at least aware of) judgements made on the same 
usages in other dictionaries. 

Atkins and Rundell (2008: 226-33) provide a superbly clear and comprehensive list 
of linguistic labels, defining, characterizing, and distinguishing each with examples and 
analytic discussion in exactly the way that is missing from the dictionaries themselves. 


Table 30.1 Diasystematic marking in a contemporary general-purpose dictionary 


Type of Unmarked Marked Examples of 
Criterion marking centre periphery labels 
1. Time diachronic contemporary language archaism - arch., 
neologism dated, 
old use 
2. Place diatopic standard regionalism, Ame, Scot., 
language dialect word dial. 
3. Nationality diaintegrative native word foreign word Lat, Fr. 
4. Medium diamedial neutral spoken-written ——callog., spoken 
5. Socio- diastratic neutral sociolects pop., slang, 
cultural vulgar 
6. Formality diaphasic neutral formal-informal — fmi, infml 
7. Text type diatextual neutral poetic, literary, poet, fit 
journalese 
8.Technicality diatechnical general technical Geagr,, Mil, 
language language Biol, Mus. 
9. Frequency diafrequential common rare rare, accas. 
10. Attitude diaevaiuative neutral connoted derog,, iron., 
euphem. 
11. Normativity dianormative correct incorrect non-standard 


Saurce: Reproduced from Svensén (2009: 316). 
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They identify the chief categories as domain (theatre, cinema, maths, etc.), region 
(including dialect), register (informal, familiar, slang, jargon, offensive terms; as they 
explain, ‘an offensive-term label indicates that the use of this item will cause offence and 
should normally be avoided’), style (e.g. literary), time (obsolete/old-fashioned etc.), 
attitude (pejorative, derogatory, appreciative, etc.; ‘an “attitude” label indicates that the 
use of this word is intended to imply... (approval or disapproval)’; e.g. slender = grace- 
fully slim), and meaning type: ‘a ‘meaning type label indicates that the item should be 
interpreted ... (literally or figuratively). Elsewhere (2008: 403) they stress the impor- 
tance of devising ‘a fine-grained inventory of labels that will allow you [the lexicogra- 
pher] to identify any variation from “default” or neutral values in a word's style, register, 
regional characteristics, currency, or pragmatic force. Anyone examining a representa- 
tive selection of dictionaries available in libraries and bookshops today will be in a 
position to support their conclusion, however: namely that ‘labelling is an area of lexi- 
cography where there is more work to be done (Atkins and Rundell 2008: 496). 


CHAPTER 31 
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31.1 INTRODUCTION 


Tuts chapter considers the use of dictionary data as input to computational processes, 
especially for research in computational linguistics (CL) and applications in natural- 
language processing (NLP). 

Given that dictionaries are widely regarded (among non-lexicographers, at least) as 
authorities on language, it seems natural to suppose that dictionaries should be a pri- 
mary resource for any application concerned with language. However, this has largely 
proved not to be the case. The reasons for this help to illuminate certain characteristic 
features and limitations of lexicography. Some of these limitations are practical, to do 
with the difficulty of formalizing content intended for human readers. Others are more 
systemic: to whatever extent a dictionary represents a given lexicon, it does so along a 
very particular axis, and this has often turned out to be poorly aligned with the needs of 
computational applications. 

However, the last twenty years have seen significant changes in lexicographic theory, 
in the tools used to research and edit dictionary content, and in the contexts in which 
dictionaries are used, The influence of corpus linguistics has shifted the focus of lexicog- 
raphy to usage, context, and quantification. Meanwhile, the transition from print pub- 
lishing to online platforms (not only websites but also apps, plug-ins, and APIs) has led 
to improved data models, better access paths, and a reconsideration of how dictionaries 
are structured and maintained. As a result of these changes, dictionary data are becom- 
ing more accessible and more suitable as a resource for language processing, albeit in a 
less central and less generalized role than had previously been anticipated. 
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In this chapter I review some of the reasons why dictionary data have been largely 
discounted as a resource for exploitation. These include the difficulty of parsing diction- 
ary content, the unsuitability of the information given in dictionaries, and the emer- 
gence of alternative resources such as lexical knowledge bases (LKBs), ontologies, and 
corpus-based statistical analysis. I consider how and why dictionaries may contribute to 
language processing tasks in the light of these factors, and what this entails for the way 
that dictionary data are implemented and managed. In particular, the utility of diction- 
ary data depends on connectivity: the ease with which it can be integrated with informa- 
tion from corpora and from wider data infrastructures. 


31.2 THE TURN AWAY 
FROM DICTIONARY DATA 


SEO AA EAE A eee Ce ete CEA GE ESCM EGA GE GHONGEGEGHEMAGGEGASGERAEAEREGSEAAGGESSSOESEGGEGERGEGEGGRERBAGGS EEL PALER PLAGESEDE TEL GLEISEEGISEAAIELAEAAGAE NEL SU AEE AEA AUAU EAU S OE AE SCA EN EEE EE ES 


When machine-readable dictionaries (MRDs) became available from the late 1960s, 
researchers were quick to explore these resources both as input into and as test-beds 
for the emerging fields of computational linguistics and NLP. Amsler (1980) developed 
computational methods to analyse definitions in the Merriam-Webster New Pocket 
Dictionary, and attempted to derive a taxonomic structure from this analysis. Lesk 
(1986) made an early and influential foray into word sense disambiguation (WSD) 
using definitions extracted from the Oxford Advanced Learner's Dictionary, identifying 
appropriate senses by looking for commonalities between the definitions of neighbour- 
ing words. Through the late 1980s and early 1990s, as MRDs became more widely avail- 
able and accessible, dictionary-based NLP building on these examples became an active 
research field (see Wilks et al. 1996). 

But by the middle of the 1990s, it was apparent that this early promise was not being 
sustained, and the approach seemed to be stalling. Ide and Véronis (1993) reviewed pro- 
gress, asking bluntly ‘Have we wasted our time?’; and although their answer was not 
quite an unequivocal ‘Yes; it was certainly close enough for their evaluation to read like 
a post-mortem. 

There are several reasons for the decline in research interest in MRDs: 


+ the difficulty of parsing dictionaries accurately has hindered the computational use 
of their content; 

« large corpora (and the computational power to analyse them effectively) have ena- 
bled statistical methods to overtake MRDs and other hand-crafted sources of lin- 
guistic information; 

+ dictionaries that were developed to meet the needs of human readers were found to 
be poorly aligned with the information that NLP requires; 

« WordNet became established as the de facto lexical resource for NLP, displacing 
conventional dictionaries; 


THE EXPLOITATION OF DICTIONARY DATA AND METADATA 503 


* concerns regarding intellectual property have restricted the availability of diction- 
ary data to the research community. 


The following sections evaluate each of these points in turn. 


31.3 PARSING: How READABLE ARE 
MACHINE-READABLE DICTIONARIES? 


For the information given in a dictionary to be any use at all to a computational pro- 
cess, it must be possible to extract that information as structured data. Minimally, this 
requires the text to be ‘machine-readable’; but given that all dictionaries are now stored 
and processed in digital form as a matter of course, this has become an unhelpfully 
vague term. What is more useful is to consider how parseable a dictionary is: whether 
information is expressed and stored in a way that supports the kind of formalization that 
computation requires. 

The first machine-readable dictionaries were typesetting tapes. Their purpose was 
the production of print editions, and their concern was therefore to control features of 
layout and typography, rather than to encode the lexicographic function of each com- 
ponent of the dictionary entry. The transfer to SGML and then XML around the end of 
the 1990s made it possible to develop more structured and meaningful mark-up models. 
XML encourages a separation of content from presentation. For dictionaries, this means 
that the various components of an entry should be identified semantically, stored in a 
use-neutral formalism, and then automatically reassembled for display on demand, This 
facilitates the composition, standardization, and checking of complex text. 

However, XML itself only facilitates rather than enforces the separation of content 
from presentation. For many dictionaries, XML mark-up has evolved from earlier typo- 
graphic mark-up, and the data model still follows the lineaments of the entry as it will 
be displayed to the (human) end-user. The adoption of stricter standards built on top 
of XML, such as the Lexical Mark-up Framework (LMF) and Resource Description 
Framework (RDF), has been constrained by the need to allow for the flexible repre- 
sentation of information, and to value user-friendliness and expressiveness over con- 
sistency and formalism. For most dictionaries intended for human readers, the source 
data remain a compromise between relatively formalized values and more freely 
discursive text. 

Divergence from strict consistency may appear in definition style, in the use of meta- 
language, and in the principles and structuring of sense division. The extent of these 
problems varies depending on the age of the dictionary and the target readership. More 
recent dictionaries are typically more standardized and use more granular mark-up. 
Dictionaries developed to handle online and automated querying are better equipped to 
map variable surface forms to canonical lemmas (Pastor and Alcina 2010), and may be 
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populated with metadata that obviates some of the messier parsing tasks. Bilingual and 
learner dictionaries are more explicit, and less concerned to avoid repetition or other 
inelegances. These all represent valuable improvements; but human readability always 
seems bound to conflict with computational parseability at some point. 

If a dictionary is the human-friendly shape given to prior information about the 
lexicon, attempting to grapple with these problems would seem a quixotic project: why 
not use that prior information directly? This proposition supposes that dictionary 
data pre-exist the dictionary itself. There is increasing consensus that this is how good 
lexicography ought to proceed. Atkins and Rundell (2008) take it as axiomatic that a 
dictionary project should fall into two stages that are temporally and conceptually dis- 
crete: first, the assembly of corpus-derived information about each word into a lexical 
database; then the composition of a dictionary entry to express this information. The 
first stage is where the hard work happens; get this right, and the second stage should 
be largely a matter of synthesis and summarization. Indeed, some lexicologists argue 
that the dictionary may then be regarded as a mere epiphenomenon of its database (Lew 
2011b). 

If this were the case, then NLP researchers would do well to ignore dictionaries, and 
instead draw directly on the underlying lexical database. Mainstream lexicographic 
practice is edging closer to the two-stage database/dictionary model,’ although the 
realities of dictionary composition are rather less programmatic than the model sug- 
gests. Certainly we are still a long way from being able to treat the dictionary as epiphe- 
nomenal. However, for many purposes it may become more viable for NLP research to 
bypass the dictionary itself, and to draw instead on the underlying database. The more 
problematic issue is whether such a database would be made available for NLP use: the 
question of licensing is considered in Section 31.6. 

In the meantime, a converse approach to making dictionaries more tractable is to 
maintain more formal expressions of lexical facts and relationships, offset as metadata. 
This may be acquired either by abstracting from the existing dictionary content, or by 
overlaying taxonomic and ontological frameworks. This is discussed in Section 31.6. In 
some respects, this may be regarded as a kind of reverse-engineering of an underlying 
lexical database, and the two approaches may serve to complement each other. 


31.4 STATISTICAL NLP 
AND DISTRIBUTIONAL DATA 


Approaches to NLP problems may be broadly classified as either ‘symbolic’ or ‘statis- 
tical? Symbolic NLP uses determinate rules, usually hand-crafted, describing seman- 
tics, syntax, and pragmatics. The approach is top-down, since text is analysed by the 


1 The DANTE project (<www.webdante.com>) is a recent proof of the concept; see Atkins et al. 2010. 
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application of prior linguistic and lexical knowledge. Performance is improved by refin- 
ing and extending the rule set. 

Statistical NLP involves the automatic acquisition of a language model through 
the analysis of large corpora. The approach is bottom-up, since it does not depend 
on prior knowledge. Such methods treat texts as examples of ‘linguistic events’ (par- 
ticularly the co-occurrence of various features) and so enable patterns and associa- 
tions to be identified, generalized, and quantified (see Manning and Schiitze 2003). 
The language model that emerges is expressed in terms of probabilities rather than 
determinate rules. 

Interest in MRDs through the 1980s and early 1990s was largely connected to sym- 
bolic methodologies: the parsing of dictionary content aimed to derive rules about the 
behaviour and meaning of words in the lexicon, which could then be applied to analyse 
language use. 

Through the 1990s, however, such approaches were rapidly outstripped by the rise of sta- 
tistical methodologies. This revolution in NLP brought a number of significant advantages: 


+ language models are derived more directly from examples of use; 

+ linguistic phenomena are expressed in terms of quantified probabilities rather than 
positive rules, enabling more flexible heuristics; 

« the automated derivation of a language model is more scalable, re-usable, and 
transferrable to different languages. 


Statistical NLP has therefore been able to take advantage of the ever-increasing avail- 
ability of electronic text, and the increasing power and expressiveness of software used 
for corpus analysis. Statistical approaches will tend to improve as a function of corpus 
size, whereas symbolic approaches must proceed by gradual and laborious refinement. 

The most striking success has been in machine translation (MT). Rule-based MT 
attempts to parse the source text syntactically, to resolve semantic ambiguities, and then 
to reconstruct an equivalent text in the target language. Statistical MT uses parallel cor- 
pora to determine and infer translational equivalences, phrase by phrase where possi- 
ble. It is thus a shallower approach, which does not need any deep understanding of the 
source or target languages. 

A simple contrast between symbolic and statistical NLP can be misleading. Most 
practical systems are hybrid to some extent. For example, a statistical MT system may 
include rule-based components (particularly morphological and syntactic knowl- 
edge) to extend the set of equivalences derived from corpora, to make language mod- 
els more flexible, or to help handle novel translation scenarios (see Thurmair 2009). 
Thus the statistical revolution is best understood as shifting symbolic techniques to a 
supplementary role, rather than as directly replacing such techniques. 

The lexical knowledge bases used in such contexts are usually task-specific 
data structures, and may themselves derive from prior corpus analysis. Does 
more general dictionary data have a part to play? It is striking that while interest in 
dictionary-based NLP has declined in favour of corpus-based NLP, so the mainstream 
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of dictionary research has itself become more directly corpus-based. In theory, a rigor- 
ously corpus-driven dictionary project constitutes a highly selective and interpreted 
digest of corpus data (see Rundell and Kilgarriff 2011). For some purposes, the medi- 
ated nature of lexicography is a useful feature, because it can help to correct for defi- 
ciencies or unevenness in corpora, to distinguish signal from noise, or to mitigate 
problems of data sparseness. 

At the same time, corpus-based lexicography is more or less bound to present lexi- 
cal behaviour as a simple set of features, abstracted from the distributional detail that 
statistical NLP is predicated on. It is true that it is now more common to include some 
explicit indication of frequency, especially in learner dictionaries (see Kilgarriff 1997a); 
but this is generally limited to very coarse-grained bandings at entry level only. This 
is rather different from representing distribution. Whereas frequency is the absolute 
count ofa particular event in a particular corpus (say, the occurrence of a certain lemma 
or collocation), distribution is a matrix showing how frequency changes as other fea- 
tures change (say, how the frequency of a collocation changes in different sections of 
a corpus). The more features to be considered, the more dimensions the matrix will 
have. Although there may be value in attempting to present richer quantitative infor- 
mation in a dictionary—say, an indication of relative frequency of senses as well as of 
lemmas—this will always be highly selective. 

The utility of dictionary data is therefore dependent on maintaining deep connec- 
tions to the corpus, so that features asserted in the dictionary can be rounded out with 
distributional detail (see Grefenstette 1998: 39). What should such connections look 
like? For most corpus-based dictionaries, a sparse and implicit connection is retained 
by virtue of example sentences extracted from the corpus. A further step is to maintain 
direct pointers to corpus instances in situ; but this is fragile, and not readily transferrable 
to new corpora. The more flexible approach is to extend and formalize syntagmatic and 
collocational information from each dictionary sense, and to use this as an intermedi- 
ary layer to locate corpus instances dynamically.” This loose coupling of dictionary and 
corpus is still rare in end-user products,* but is becoming a more regular part of lexico- 
graphic software. 

The rise of statistical NLP does not render symbolic approaches redundant, and in 
particular does not negate the value of dictionary data, at least in a supplementary role. 
But it does suggest that dictionary data are useful only to the extent that they are system- 
atically based on corpus analysis, and can be tied to corpus distributional data. 

The following section considers this in more detail in relation to word sense 
disambiguation tasks. 


2 This assumes a collocationally-slanted approach to sense division, discussed in the next section. 
3 Intellectual property issues around corpora are a factor. See Trap-Jensen (2009) for an example 
using a public corpus. 
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31.5 THE PERTINENCE OF DICTIONARY 
DaTA: SENSE INVENTORIES 


‘The rise of statistical NLP has posed questions about whether the nature of diction- 
ary information fits the needs of NLP. As a number of lexicologists have pointed out, 
it would be surprising if that was straightforwardly the case (Kilgarriff 2000; Zaenen 
2002). Dictionaries have traditionally concentrated on particularity and idiosyncrasy, 
word by word, whereas most NLP applications engage with relationships and behav- 
ioural patterns between words. Dictionaries have tended to focus on meaning, whereas 
NLP usually proceeds from an analysis of usage. 

More generally, a given dictionary will have evolved to answer the kind of questions 
that its intended readership may put. This question-answering role is different from giv- 
ing an account of the behaviour of a word, and in particular the way that words interact 
in natural language. Purpose-built NLP lexicons therefore look quite different from data 
structures derived from human-readable dictionaries (Zaenen 2002: 239; De Schryver 
2003: 145). This is not a matter of formal representation, but rather of what is being rep- 
resented and why. 

The aspect that has received most attention in this respect is the concept of the sense 
inventory (the set of meanings identified for a given lemma). This has been a central 
concern of computational exploitation of dictionary data, particularly in the service of 
word sense disambiguation. 

As in Lesk’s pioneering work mentioned in Section 31.2, early WSD projects used 
dictionaries both as the source of a sense inventory and as the means to discriminate 
between those senses. However, a number of difficulties emerged with this approach. 
Sense inventories vary depending on the dictionary; the criteria for sense discrimina- 
tion may be obscure; fine sense distinctions are hard to disambiguate, and the value of 
doing so is often unclear. Perhaps most significantly, human performance using exist- 
ing sense inventories has proved to be problematic. It had been supposed that humans 
could be used to assign corpus instances to senses in a given inventory, and that this 
could then serve either as training data or as a gold standard for evaluation. But annota- 
tion exercises of this kind show high levels of disagreement between individuals, and 
also leave many corpus instances that appear to fit none of the available senses (Véronis 
2001). This may be taken to indicate that sense inventories are too fine-grained, too dis- 
joint, or too weakly mapped to usage. More simply, it may be taken to indicate that the 
presumption of a finite sense inventory is flawed (see Hanks 2000). 

These questions are also raised by the gap between explicit and implicit WSD. In 
explicit or ‘stand-alone’ WSD, the application takes an example of a word in use as its 
input, evaluates it against some inventory of senses, and returns the appropriate sense 
as output. In implicit WSD, by contrast, examples of words in use get partly or fully dis- 
ambiguated in the course of some other NLP task; such sense disambiguation as occurs 
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may not be directly available for inspection, and may not be directly related to a sense 
inventory per se. Most theoretical research has been concerned with explicit WSD, 
whereas most real-world WSD is of the implicit kind. For example, tasks like informa- 
tion retrieval and text summarization may all involve some element of sense disam- 
biguation, as a means to an end (Agirre and Edmonds 2006b: 2-3). 

It is tempting to suppose that implicit WSD is simply the incorporation of explicit 
WSD as a module within a larger system. But in reality this is rarely the case. Implicit 
disambiguation processes tend not to be modularized in ways that would allow disen- 
tanglement from the application as a whole. Sometimes this is because WSD is a kind of 
side-effect: for example, part-of-speech tagging is sufficient for some disambiguation, 
although this may not be the main purpose. More often, WSD is integrated as part of 
a broader set of strategies, and such WSD as takes place is oriented towards the overall 
objective of the application (see Ide and Wilks 2006). 

Progress in explicit WSD, relative to a determinate sense inventory, has therefore 
been difficult to convert to improved performance in real-world applications. This has 
led many observers to question whether it is meaningful to enumerate a sense inventory 
independently of a particular use-case: ‘Word senses are simply undefined unless there 
is some underlying rationale for clustering, some context which classifies some distinc- 
tions as worth making and others as not worth making’ (Kilgarriff 1997b: 109). 

The development of corpus-driven lexica has helped to address some of these issues 
by approaching the sense inventory in terms of usage rather than meaning. Whereas 
earlier dictionary senses are centred on a semantic definition (by paraphrase or by an 
enumeration of attributes), a corpus-driven lexicon attends more to the surface features 
that characterize examples, especially syntax and collocation (Véronis 2001). For the 
more hard-line advocates of this approach, meaning is inseparable from use, and the 
definition becomes a second-order gloss on surface features rather than the core com- 
ponent of the sense. 

In this view, the sense inventory is an emergent result of techniques to cluster together 
corpus examples according to their surface features. A word sense really just resolves to 
a cluster of contextually similar occurrences (Yarowsky 1993; Schiitze 1998). This model 
of word sense induction (WSI) implies a quantifiable feature space or landscape, rather 
than finite and disjoint senses. Paradigmatic uses are found at the centre of a cluster, 
whereas ambiguous examples are intermediate between clusters. These clusters may in 
turn be more or less closely grouped together, indicating related senses. 

This approach to the sense inventory makes WSD more viable, most obviously 
because it foregrounds surface features and therefore makes the disambiguation task 
more directly tractable. Techniques for automated clustering mean that sense invento- 
ries can be obtained for new corpora or new disambiguation scenarios, with little or no 
supervision (see Navigli 2009: 23-30). Sense inventories of varying kinds and levels of 
granularity can be induced by adjusting the weightings given to different features, or by 
changing the threshold at which a cluster is regarded as dense enough to be partitioned 
as a sense. The ability to ‘tune’ a sense inventory in this way is valuable when using WSD 
in different contexts. 
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These characteristics mean that induced sense inventories have tended to differ from 
the sense inventories of older dictionaries, not only in the set of senses but also in their 
motivation. However, techniques akin to WSI are increasingly incorporated into the 
methodologies of more corpus-based dictionary projects, in order to prime the process 
of sense analysis. Although lexicographers are very likely to depart from or elaborate on 
the sense divisions that are initially suggested by such tools, dictionary sense inventories 
overall will tend to become better aligned with direct WSI. It has been argued that most 
real-world tasks involving WSD only need to make coarse-grained distinctions—chiefly 
between homographs, and cases of polysemy that are clear-cut enough to be comparable 
to homographs (Ide and Wilks 2006). At this level, we should expect good agreement 
between sense inventories deriving from similar corpora, even if that derivation is by 
different routes. 

Of course, a dictionary sense inventory is bound to be more absolute and less flexible. 
However, a well-designed entry structure should at least capture a hierarchical idea of 
major and minor sense distinctions, and make provision for minor senses to be col- 
lapsed into parent senses when required. 

Thus corpus-based dictionaries are becoming better oriented towards the kind of 
sense inventory and sense motivation that is pertinent to practical WSD tasks. Again, 
this is not to suggest that dictionaries will ever return to a central role in WSD, but rather 
that there is scope for dictionaries to provide supplementary information to refine, test, 
and corroborate such processes, or to articulate core WSD processes with other layers of 
an application. This supplementary or articulatory role is explored in more detail in the 
following sections. 


31.6 WorDNET, GRAPHS, AND ONTOLOGIES 


Dictionaries are conventionally structured as a list of headword-based entries, each 
entry containing in turn a list of senses with associated information. In print form, this 
‘list of lists’ paradigm treats words in isolation from each other, rather than participating 
in a language system. This paradigm has been largely conserved not only when encod- 
ing print dictionaries as MRDs, but also when initiating electronic dictionaries from 
scratch. 

The headword-first model of the list-like dictionary has some subtle but pervasive 
consequences. It means that dictionaries espouse a largely word-centred view of the 
lexicon, rather than, say; aconceptual, combinatory, or behavioural view.* Whatever the 
merits or demerits of this word-centred view with respect to the needs of the human 
user, it puts significant limits on the utility of dictionary data for NLP applications. 


4 The structural isolation of entries is mitigated by cross-references, but these are rarely systematic. 
Cross-references lack semantics: there is no formal indication of the nature of the relationship being 
asserted, which must instead be gleaned from the discursive context in which the reference appears. 
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Alongside dictionaries, indexed thesauri have provided a complementary means of 
access: broadly, a thesaurus implements a concept-centred view of the lexicon. But while 
dictionaries and thesauri have remained separate entities (theoretically and physically), 
they represent two sides of the same deficiency, rather than a solution to that deficiency. 

The development of the Princeton WordNet project (<wordnet.princeton.edu>) from 
the late 1980s was in part a response to this structural limitation of conventional dic- 
tionaries (see Miller et al. 1990). WordNet began as a psycholinguistic project to model 
aspects of language processing in the brain, and sought to represent the lexicon as a sys- 
tem of concepts rather than as a collection of words.° The lexicon is modelled as a struc- 
tured network of ‘lexical units’ (the pairing of a lemma and a meaning, approximating to 
a dictionary sense), WordNet groups lexical units into sets of synonyms (‘synsets'’) shar- 
ing a common definition. Synsets are in turn linked to each other by various semantic 
relations (hypernymy, meronymy, entailment, etc.).° 

WordNet is thus thesaurus-like in microstructure (a collection of synonym sets) and 
ontology-like in macrostructure (a set of relationships between concepts). In essence 
it is a directed graph, with synsets constituting the nodes, and semantic relationships 
constituting the edges. 

WordNet soon established itself as the most widely used lexical resource for NLP, dis- 
placing conventional dictionaries. WordNet’s open licensing terms facilitated its adop- 
tion as a shared standard (Kilgarriff 2000); and its formalism as a lexical database made 
it more tractable than MRDs. As a knowledge base of semantic relationships, WordNet 
has proved a useful complement to corpus-based methods of text analysis, in tasks such 
as information retrieval, text classification, and WSD (see Leacock et al. 1998). 

WordNet is sometimes criticized as an uneasy hybrid of the two organizing principles 
of thesaurus and ontology (see, for example, Zaenen 2002: 234). But this can equally be 
seen as its strength: it has the effect of folding lexical and conceptual information into 
a single graph, and so puts WordNet in a good position to mediate between linguis- 
tic analysis on the one hand, and knowledge bases on the other. This has become more 
important as much NLP research has gravitated towards knowledge management tasks 
(information retrieval, inference, summarization, etc.). 

From an NLP perspective, WordNet has its own limitations. WordNet represents the 
lexicon primarily as a system of synonymy and hypernymy, which can give a decep- 
tively neat and formulaic picture. It is not well equipped to express more oblique and 
indeterminate relationships between concepts, which may be salient features of usage 
events (Ferret and Zock 2006). Sense distinctions are too fine-grained for most WSD 
purposes: the synset paradigm attends more to similarities between senses of different 
lemmas than to differentiating senses of the same lemma. 


> The original WordNet project was for English, but has subsequently been emulated for many other 
languages: see <globalwordnet.org>. 

® Additionally, lexical units may be linked to each other by lexical relations (notably antonymy) that 
are independent of the synset as a whole. 
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Given that WordNet originated as an attempt to model the mental lexicon, not to model 
linguistic phenomena observed in context, such limitations are not surprising. The wide- 
spread use of WordNet in spite of this is instructive: WordNet may be an imperfect resource 
considered in its own right, but it provides a good lexico-semantic framework for articulat- 
ing other resources and techniques within a larger NLP system. As a concept-led graph, 
WordNet can be readily extended outwards to other ontologies, or enriched internally with 
new semantics.’ 

WordNet’ utility is further increased by its participation in larger information graphs. 
Notable among these is the Linked Data initiative (Heath and Bizer 2011), which aims to 
connect independently-authored resources into a global data space. A version of WordNet 
implemented as RDF/OWL plays a significant role in the Linked Data infrastructure, in 
particular helping to parse and structure encyclopedic resources like Wikipedia into for- 
mal ontologies (van Assem et al. 2006; Bizer et al. 2009). The Linked Data project is still 
in development, but is already suggestive of the value of granular, graph-based interaction 
between lexical information and encyclopedic knowledge. 

The graph-like WordNet model appears radically different from the list-like dictionary 
model. However, a number of Jexicologists have argued that dictionaries can be under- 
stood as implicit graphs. Fontenelle, for example, conceptualized dictionaries in terms that 
now sound very WordNet-like: “The lexicon is no longer seen exclusively as a repository of 
irregularities and idiosyncratic information, but must be viewed as a network whose nodes 
(the actual words) are closely interrelated in a web-like fashion. The arrows that connect 
those nodes are labels for lexicosemantic relations’ (Fontenelle 1990: 99). His analysis first 
of LDOCE and then of the Collins Robert French Dictionary (Fontenelle 1997a) involved the 
exhaustive enumeration of regular patterns in definitions in order to derive the links for 
such a network, More recently there have been several experiments to model dictionaries in 
RDEF/OWL, the standard formalism for information graphs (see Spohr 2012). 

Dictionaries are not reducible to graphs, but it is certainly true that dictionaries have 
(potentially) some important graph-like characteristics. The utility of dictionary data 
can be enhanced by encoding the graph-like elements more explicitly. As the example of 
WordNet suggests, this may serve as a framework both for articulating information within 
the dictionary and for articulating the dictionary with other resources. 

Such a graph may be developed in both bottom-up and top-down fashion. Bottom-up, 
any connections that are explicit or implicit in the content may be formalized as directed 
links between nodes. As Fontenelle suggests, this includes not only existing cross-references 
but also any point-like lexical features, such as collocates, synonyms, domain labels, and 
salient terms in definitions (especially genus terms). Top-down, taxonomic classification 
schemes may be imposed on dictionary content. These may be encoded as links between 
lexical items and class terms, expressing relations of hypernymy, domain membership, etc. 


? See <http://wordnet.princeton.edu/wordnet/related-projects/#extensions> for some examples. 
8 A number of online dictionaries overlay classifications on to core dictionary content, enabling the 
user to navigate laterally along lines of semantic, associative, or domain relationships. These range from 
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The bottom-up and top-down approaches will overlap at many points, so need to 
be engineered in tandem. It makes sense to adopt a single formalism—if not RDF/ 
OWL, then certainly a model that can be readily converted to and from RDF/OWL. 
Importantly, this means that every link from one node to another is typed, that is, carries 
a predicate in a standardized vocabulary to define the nature of the relationship.? 

What counts as a node? Essentially, every data point should be addressable 
(whether connected to a graph or not). In practice, the cardinal entities are senses 
(lemma+meaning couples, similar to WordNet’ lexical units). From this point of view, 
entries are more usefully treated as second-order entities serving to encapsulate shared 
features such as orthography, phonetics, and etymology. This perspective ensures 
appropriate granularity: most of the salient relationships asserted in a dictionary-based 
graph are properly between senses, not between words. 

Encoding features in graph form helps to articulate the lexicon as a system, and 
provides a formal abstraction for information which the dictionary itself may treat 
more variably or discursively. In addition, as the examples of WordNet and Linked 
Data indicate, a key benefit is the potential for interoperability with other informa- 
tion graphs. An important step in that direction is to make data points addressable 
as nodes by assigning a persistent and public ID to each. In the Linked Data model, 
resource [Ds are published as dereferenceable HTTP URIs: that is, URIs which can be 
requested by an HTTP client, and for which a description of the resource is returned 
(Heath and Bizer 2011). This requires some moderate technical infrastructure, but 
more significantly requires making public a minimal level of data from the underlying 
dictionary: a public URI is pretty useless if it is not possible to determine what the URI 
identifies. 

More generally, a dictionary will be able to participate in wider data infrastruc- 
tures only to the extent that the objects it represents can be openly identified and 
addressed. The widespread use of WordNet is more often attributed to its open 
licensing terms than to its lexicographic qualities. Although commercial publishers 
are unlikely to go as far as academic projects in this respect, recent years have seen 
some positive responses to the conundrums of intellectual property and dictionary 
access: in the same way that freely available websites now showcase many major dic- 
tionary titles for human readers, it should be viable to publish freely available indices 
of dictionary content for computational clients. The challenge is to make this mean- 
ingful without exposing the full content, which may be subject to more restrictive 
licensing terms. Graph-based models, as abstractions from the full content, suggest 
one solution to this. 


informal ‘folksonomies’ (e.g. Wordnik, <https://www.wordnik.com>) to more formal taxonomies 
(e.g. Oxford Dictionaries, <www.english.oxforddictionaries.com>). 

° The data expressing a directed link is sometimes referred to asa triple, since it consists of three 
entities: a subject, a predicate, and an object. 
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31.7 CONNECTEDNESS AND AVAILABILITY 


The declining research interest in dictionary data may be taken to suggest that lexicog- 
raphy has been left to pursue the peculiar needs of human readers, while the mainstream 
of linguistic data development rolls forward in the service of computational needs, 
building corpora, ontologies, and bespoke lexical databases. There is some truth in this. 
Yet many dictionaries are becoming better aligned with language processing needs, due 
in large part to the influence of usage-oriented corpus linguistics. While dictionaries no 
longer have a central role in NLP, they may be a useful source of supplementary infor- 
mation for other systems. 

A characteristic of dictionaries designed for human readers, compared to other 
lexical resources, is their heterogeneity: a dictionary may integrate morphological, 
orthographic, phonetic, syntagmatic, collocational, associative, semantic, etymolog- 
ical, encyclopedic, and ontological information. For extensively studied languages, 
especially English, much of this information will be common to other dictionar- 
ies and to existing lexical knowledge bases. ‘Today, a computational linguist or NLP 
engineer has no need to duplicate lexical facts already well captured elsewhere. But 
it may be useful to extract specialized subsets of these features to plug gaps in other 
resources. 

NLP applications are necessarily opportunistic, combining information from multi- 
ple sources. Rather than any grand unified lexical database, successes have come from 
more targeted task-specific resources and tools (Zaenen 2002: 242; Ide and Véronis 
1993: 262). Coordination problems abound. Aspirations of completeness and com- 
prehensiveness are therefore less important than connectivity and interoperability. 
Practically, this requires standardized data models and efficient mappings between data 
sets (Romary 2013). Organizationally, it requires making lexical information available as 
assets to be mined and recombined. 

Dictionaries have been notably isolated until very recently. Whereas mappings 
between different resources is a familiar topic in the context of knowledge bases (see 
Euzenat and Shvaiko 2007), it is curiously underexplored in lexicography. This may be 
attributed to concerns about intellectual property, but it may also stem from a purist 
idea of the dictionary as a complete and autonomous account of the lexicon. This isola- 
tion curtails the currency of dictionary data, compared with the more promiscuously 
interconnected world that WordNet fosters. (Indeed, one of the most efficient ways to 
enhance a set of dictionary data would be to publish mappings to WordNet itself.) 

There isa contrary strand in modern lexicology that advocates a turn away from sup- 
posedly general-purpose dictionaries and towards more targeted, problem-specific 
lexical tools, designed for use in combination with each other—what Tarp calls ‘lexi- 
mats’ (Tarp 2008: 120-1; see also Verlinde et al. 2010). As dictionary developers adapt 
from print to the more versatile possibilities of electronic platforms, it is likely that 
this will become a more significant lexicographic paradigm. Hence there may be a 
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partial convergence between lexicography for human users and lexical resources for 
computational purposes. 

These more modest niche roles will not bring the kind of revenue streams that 
NLP once seemed to promise for publishers: licensing revenue is now more likely to 
come from the reuse of content on new platforms (apps, mobile devices, etc.). For 
a dictionary developer, the value of participating in NLP research is lexicographic 
rather than directly financial: in the opportunities to analyse, test, and enhance lexi- 
cal information, and conversely to integrate NLP techniques into editorial practice. 
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32.1 INTRODUCTION 


IN so far as words as linguistic symbols have a semantic relationship with non-linguistic 
reality—the world—then the world is always present in dictionaries. Nevertheless, 
within dictionary entries, it is possible to distinguish linguistic information in the nar- 
row sense, for example about pronunciation, spelling, and meaning, from encyclopae- 
dic and cultural information, for example about a device or a social situation. This kind 
of information can in turn also be conveyed in written form or pictorially by means of 
illustrations or videos. For this reason, these topics will be dealt with together in this 
chapter. 

Encyclopaedic and cultural information, as well as illustrations, enrich a diction- 
ary, but also present lexicographers with particular conceptual and practical chal- 
lenges. This is especially the case now that it is possible, in electronic dictionaries and 
particularly in internet dictionaries, to offer users a wider variety of illustrations and 
a range of encyclopaedic and cultural information, not only through illustrations, but 
also through the inclusion of videos, for example, or by means of links with external 
sources and reference works. In this chapter, the practice in printed as well as elec- 
tronic dictionaries will be presented and discussed. In addition, research into the use 
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of dictionaries in relation to illustrations and encyclopaedic information in dictionar- 
ies will be discussed 


32.2 ILLUSTRATIONS 


An illustration is a particular kind of image, which is used in conjunction with a text and 
which decorates, illustrates, or explains the text. With illustrations, there is always a par- 
ticular relationship between the (printed/written or electronically published) text and the 
image; illustrations always have a particular function. Images are iconic symbols (Peirce 
1932 [1867]), just as words are linguistic symbols. 


32.2.1 The Relationship between Text and Images 


When text and image are combined in a dictionary, both are visible at the same time 
and refer to each other. At best, there is a complementary relationship between the 
definition of a headword and the illustration, so that the whole meaning can be ascer- 
tained from the definition and the illustration, with the illustration completing the 
text and vice versa.! This is generally effective when a written description of the exter- 
nal form of an object reaches its limits, and the visualization of the object in an illus- 
tration contributes to a better understanding. Ultimately, however, it is generally the 
case that information is retained in the memory better when it is conveyed through 
the use of both text and images. Dictionaries can make use of these advantages in a 
variety of ways. 


32.2.2 Characteristics of Illustrations in Dictionaries 


Illustrations are mostly found in dictionaries which are intended for use in text reception 
(Svensén 2009: 299), but also in learners’ dictionaries (see Heuberger, this volume; Ilson 
1987; Nesi 1989, 2000) and in dictionaries for children (Landau 2001: 147ff.).? To enable 
them to fulfil their function of explaining the meaning of headwords, it is important that 


! [lustrations cannot therefore replace a definition, and similarly, a definition cannot convey all the 
information an illustration can (Landau 2001: 143f.). For suggestions on how to represent meaning in an 
electronic dictionary through the use of additional multimedia elements, see Lew (2010). 

2 In bilingual dictionaries, by contrast, illustrations are superfluous, as the meaning of the headword 
is unambiguously explained through its translation. Despite this, even bilingual dictionaries can contain 
illustrations, labelled in the two languages. Lew and Szczepaniak (2011) discuss the usefulness of imagery 
in the form of pictorial illustrations in idioms dictionaries. 
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the number, concentration, and size of the illustrations, and how they are positioned in 
the dictionary, are carefully devised. The size of an illustration and its positioning are also 
closely linked (Hupka 1989a: 705). 

In a printed dictionary, illustrations can be situated in the immediate vicinity of a 
headword,’ or sometimes they can be outside the type area in the margin.’ Illustrations 
can also be combined into image groups or whole plates.” If an illustration is not in the 
immediate vicinity of the headword, then it must refer to the headword, and vice versa. 
Lexemes which do not appear in the dictionary as headwords should not be illustrated. 
In electronic dictionaries, there is enough space on the computer screen to enable 
illustrations to always be displayed at the same time as the associated word entry (see 
Figure 32.1). However, illustrations can also be displayed in a (smaller) window which 
is opened separately; this is particularly useful when a dictionary is being accessed on a 
smartphone, for example. 

Depending on the amount of space available, illustrations in printed and electronic 
dictionaries are of very different sizes. As well as illustrations the size of a postage 
stamp, there are also full-page images.° In an electronic dictionary, illustrations can be 
integrated in such a way that they can be made larger or smaller by the user as required 
(see Figure 32.1; a click on one of the illustrations shown here will open a new window 
with a larger image). Depending on the amount of space needed and the type of head- 
words dealt with in the dictionary, the system for and concentration of the illustra- 
tions must be considered. Not all the headwords in a dictionary are equally suited to 
being illustrated. The entries which are illustrated most frequently in printed as well 
as electronic dictionaries are of course concrete nouns (see Figure 32.1), but adjectives 
(e.g. colour terms’), or prepositions (e.g. prepositions of place*) can also be illustrated. 
Headwords which lend themselves to illustration should be illustrated as thoroughly as 
possible and not just partially. In lexicographical practice, the relationship between the 
number of headwords and the number of illustrations is very variable, and this is the 
case in printed (Hupka 1984, 1989a) as well as electronic dictionaries (Kemmer 20148). 

Just as variable are the illustration techniques used in dictionaries. There are black- 
and-white images and colour illustrations,’ drawings, and photographs (see Figure 32.1). 
In printed dictionaries, predominantly black-and-white drawings are used, whereas in 
electronic dictionaries, there is less of an economic disincentive to include colour illustra- 
tions, and so these are more common. The choice of illustration technique should not, 


3 For example, the illustrations in Langenscheidt Taschenworterbuch Deutsch als Fremdsprache 
(LTDaF). 

* For example, the illustrations in The American Heritage Dictionary of the English Language (AHD). 

5 See, for example, the plate ‘Die Zeit’ in LTDaF. 

© See the illustrations in AHDand the full-page plates in LT Dak 

? See, for example, the plate ‘Farben’ in LTDaF. 

8 See, for example, the plate ‘Prapositionetf in LTDaF. 

9 See the coloured plates and black-and- White images in LTDaF. 
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glass 
1 [uncountable] a transparent solid substance 
used for making windows, bottles etc: 
~ @ glass bowl 
% @ piece of broken glass 
pane/sheet of glass (=a flat piece of glass with straight edges) 
“ the cathedral's stained glass windows 


2 Zaye [countable] a container used for 


drinking made of glass [+ cup] 
wine/brandy/champagne etc glass 


* Megel mised his glass in a toast to his son. 


3 [countable] the amount of a 
drink contained in a glass 
glass of 
« She poured a glass of wine. 


4 glasses [plural] two pieces of 
specially cut glass or plastic in a frame, which you 
wear in order to see more clearly [= spectacles]: 

* He was clean-shaven and wore glasses. 

« I need a new pair of glasses. 
distance/reading glasses 

i Do not say 'a glasses’: She's got nice (NOT a 
nice) glasses. ~*> DARK GLASSES, FIELD GLASSES 


FIGURE 32.1 Entry glass in LDOCEonline 


however, be determined by cost alone, but above all by the information content and 
desired function of the illustration.'° 

In both printed and electronic dictionaries, image and text are closely related to each 
other: the word entry provides information about the headword, which is supplemented 
by the illustration, with the caption (legend) guiding how the image is interpreted 
(Barthes 1964: 44f.). In addition to this, identifying labels, which support the correct 
interpretation of the illustration, can appear on the illustration itself (Stein 1991: 112ff., 
see Figure 32.1). The illustration supplements the dictionary text (in particular the 


1” For a comparison of drawings vs. photographs and black-and-white vs. colour pictures in terms of 
their performance and function, see Hupka (1989a: 708ff., 1989c: 206ff.). 
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definition) with regard to those aspects which can be expressed in writing either only 
inadequately or not at all (see also Section 32.2.6). 


32.2.3 Types of Illustration in Dictionaries 


Different types of illustration are used in dictionaries (according to Hupka 1984, 1989a), 
depending on the kind of headword being illustrated.” It is possible to differentiate 
between simple and complex illustrations. Unique illustrations, which are the most 
common type of illustration in dictionaries, show precisely one object (see Figure 32.1). 
Multiple illustrations present an object in various forms (e.g. different breeds of dogs 
for the headword dog). In sequential illustrations (used especially with verbs), the same 
event or object is illustrated in a number of phases which follow on from one another, 
and this gives the impression of movement. In structural illustrations, only one part of an 
object is shown, which is connected to a larger whole. These parts are often highlighted 
through the use of a frame or arrow. Functional illustrations describe the inner structure 
of an item and how, for example, a device works. A nomenclatural illustration not only 
illustrates the headword, but also provides the vocabulary of whole specialist areas ina 
complex depiction. Scenic illustrations are even more extensive; they show the headword 
being described in an everyday scene (e.g. at the shops, at the railway station). Scenic 
illustrations also lend themselves to illustrating a particular group of headwords (e.g. 
prepositions of place). Functional schemata go beyond functional illustrations in that 
they show whole processes (e.g. the manufacturing of paper). Flow charts are typical 
examples of these. Encyclopaedic illustrations in the narrow sense are those which illus- 
trate an abstract headword only indirectly through a visible example of a partial aspect 
of the word (e.g. when as an illustration of the headword archeology an excavation site 
is shown). All the illustration types mentioned here always convey encyclopaedic infor- 
mation (Svensén 2009: 298). 


32.2.4 Illustrations in Printed Dictionaries—Examples 
and Analysis 


Illustrations are most frequently used in general monolingual dictionaries, although 
there are clear differences here in the lexicographical tradition of individual coun- 
tries. There are, for example, relatively few German-language illustrated dictionaries, 
whereas there are many in French, Spanish, and English. Two very different general 


1 For other typologies, see Hupka (1989a: 710ff.) and Svensén (2009: 303ff.), as also for examples for 
each type. 
2 For an overview of the historical development of this, see Hupka (1989¢: 67-144). 
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English-language dictionaries will be presented as examples here: the Illustrated Oxford 
Dictionary (IOD) and The American Heritage Dictionary of the English Language (AHD). 

IOD is characterized by a very high concentration of illustrations, as the title suggests. 
The choice and positioning of the illustrations have clearly been carefully devised. The 
illustrations, which are in colour throughout (and which are mostly unique, although 
there are also multiple, functional, and nomenclatural illustrations, drawings, and pho- 
tographs), depict not only nouns, but also adjectives and verbs. Many illustrations are 
integrated into the text of the dictionary and linked to the text through the use of cap- 
tions; in the illustrated headwords, reference is made to the illustrations. Next to these 
are panels, which are presented separately from the dictionary text, but which relate to 
individual headwords. The panels can be multiple illustrations or nomenclatural illus- 
trations, or a hybrid of the two. Overall, this dictionary presents a convincing, aestheti- 
cally appealing, modern illustration concept for a printed dictionary. 

AHD is likewise illustrated in colour (drawings, photographs, schemata, dia- 
grams), although the concentration of illustrations is lower than in JOD: of 
approximately 200,000 headwords, about 4,000 are illustrated. The illustrations (pre- 
dominantly unique illustrations, but also multiple and sequential) are situated in the 
margins throughout, and are therefore not in the immediate vicinity of the headword. 
Furthermore, because of this positioning, their size is restricted. While the captions of 
the pictures link the illustrations to the headwords, there are no references in the head- 
words to the pictures. Nouns as well as verbs and adjectives are illustrated. The lack of 
references between headwords and illustrations, and the small pictures are—in com- 
parison with the illustration practice in other dictionaries—not very convincing. 

An example of an illustrated learners’ dictionary for English is Oxford Advanced 
Learners Dictionary (for more on this, see Heuberger, this volume). The follow- 
ing are two examples of illustrated learners’ dictionaries for German: Langenscheidt 
Taschenworterbuch Deutsch als Fremdsprache (LTDaF) and Hueber Worterbuch Deutsch 
als Fremdsprache (HWDaF). LTDaF, which is aimed at beginners and intermediate 
learners, will be presented briefly as an example here. In addition to black-and-white 
drawings, it contains twelve full-page colour scenic plates, which are bound in the mid- 
dle of the dictionary. The headwords illustrated here refer to the plates (although not 
always). It is predominantly nouns which are illustrated, but also verbs, colour adjec- 
tives, and prepositions. The other illustrations consist of almost fifty black-and-white 
drawings, which are predominantly unique illustrations, but there are also multiple 
illustrations. In this case as well, there is insufficient interlinking between the head- 
words and the illustrations, although their positioning in the dictionary text does help 
to establish a connection between headword and illustration. Overall, the concentration 
of around fifty illustrations for about 30,000 headwords is very low, which can also be 
said of other German learners’ dictionaries. The practice adhered to in these dictionar- 
ies clearly shows that the tradition of illustration is in these cases still young and expand- 
able, which also applies more widely to general monolingual printed dictionaries of 
contemporary German. 
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The picture dictionary represents a particular kind of dictionary (Scholze-Stubenrecht 
1990), which can be monolingual, bilingual, or even multilingual. A picture diction- 
ary only contains headwords which can act as descriptions of what is shown in the 
pictures. Usually, picture dictionaries are onomasiologically constructed and do not 
contain definitions. Examples of monolingual picture dictionaries are Duden—Das 
Bildworterbuch or the educational children’s picture dictionary Oxford English Picture 
Dictionary—Monolingual Edition. Based on Duden—Das Bildwéorterbuch, there are a 
number of bilingual pictorial dictionaries, for instance Oxford-Duden Bildworterbuch 
Deutsch und Spanisch. 


32.2.5 Illustrations in Electronic Dictionaries—Examples 
and Analysis 


Electronic dictionaries can either be transferred from printed form into electronic form 
(mostly dictionaries on CD-ROM or internet dictionaries), or they can be developed 
specifically in electronic form (almost exclusively internet dictionaries, but also hand- 
held dictionaries and robust-machine dictionaries; de Schryver 2003: 151).* In both 
forms, as well as illustrations, additional media can be used, in particular videos and 
audio files; these will not, however, be dealt with in depth here. 

An example of a retrodigitalized English learners’ dictionary is Longman Dictionary 
of Contemporary English Online (LDOCEonline). Taken from the printed version, illus- 
trations of headwords (e.g. computer) or individual meanings of headwords (e.g. glass 
2 ‘for drinking’ and glass 4 ‘for eyes’ in Figure 32.1) are presented in the context of the 
relevant dictionary entry, which can be displayed enlarged in a separate window at the 
click of a mouse. 

These colour photographs and drawings are predominantly unique illustrations, 
but they also include structural, scenic, and functional illustrations. It is not possible 
to determine the concentration of illustrations, as there is no information on the num- 
ber of illustrations. Whenever a headword or an individual meaning of a headword is 
illustrated, this is done with precisely one illustration (in principle, due to the unlim- 
ited space available online, it would be possible to provide several illustrations for the 
same headword; see Section 32.2.6). The illustrations can only be accessed semasiologi- 
cally, that is, via the individual headwords, because there is no index of illustrations. The 
illustrations are not onomasiologically organized either, that is, according to groups of 
items. There are no links between word entries and illustrations, and in order to see an 
image, two clicks are normally needed. From these facts, it is clear that the opportunities 


3 For other types of picture dictionaries, see Hupka (198ga: 718f.). 
On illustration practice in the online dictionaries presented here and in other online dictionaries in 
2010, see the relevant section in Kemmer (20144). 
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offered by the electronic medium are not made use of in this dictionary (as is the case 
with many other retrodigitalized dictionaries). 

The collaborative, multilingual online dictionary elovivo contains both manually 
chosen, individual illustrations for some headwords (e.g. towel), and (under the head- 
ing ‘More Images’) several illustrations which are automatically taken from an image 
database (e.g. melon).'© Nouns, as well as verbs, adjectives, prepositions, and adverbs 
are illustrated in this way, through the use of unique colour drawings and photographs, 
which can be seen straight away in the immediate context of the headword. As well as 
semasiological access, an illustration index also allows access to the illustrations, and 
in addition, there are attachments for onomasiological access (e.g. via the ‘Popular 
Dictionary Tags’ such as School, headwords such as alphabet, bulletin, chalk, etc. are cat- 
egorized with the relevant illustrations). What stands out about elovivo, in comparison 
with the retrodigitalized LDOCEonline, is that it makes better use of the opportunities 
presented by the online medium, for example by linking to other online sources (in this 
case image databases) and by offering other ways of reaching headword entries with 
illustrations, in addition to semasiological access. 

The market for illustrated online dictionaries in general (including picture dictionar- 
ies such as The Internet Picture Dictionary, illustrated technical dictionaries such as the 
Virginia Tech Multimedia Music Dictionary, and children’s dictionaries such as A Maths 
Dictionary for Kids) is large and diverse. As is clear from the examples from general lan- 
guage dictionaries presented above, there is in general (still) no standard practice in 
relation to illustrations. However, there is a clear tendency that those dictionary types 
which are also published as printed dictionaries are illustrated, and also that those types 
of illustration that are already known are used, The medium has a tendency towards a 
higher concentration of images, and in addition to this, the illustrations can be viewed 
in different sizes and are searchable by means of new access facilities. The positioning of 
the illustration(s) on the screen still varies, on the other hand, but they appear mostly in 
the context of the definition. Well-made online dictionaries also have numerous links 
between dictionary text and illustrations. Further developments are awaited. 


32.2.6 Usage Research on ILlustrations in Dictionaries 


Up until now, research into illustrations in dictionaries has only been carried out on a 
small scale.!” There are a number of questions which need to be clarified through the use 


5 Incidentally, LDOCEonline contains hardly any other multimedia elements (i.e. no charts, 
diagrams, or videos). For words beginning with D or S, there is information about pronunciation (audio 
files for British and American pronunciation, pronunciation in an example sentence). 

16 The only additional multimedia element in this dictionary is information on pronunciation 
(individual word and example sentence). There are no other such elements, e.g. videos, charts, or 
diagrams. 

” There are psychological studies of perception on the subject of image interpretation itself, e.g. 
Héger (2005). 
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of different research methods into dictionary use, for example the fundamental ques- 
tion of how useful dictionary users consider illustrations to be. In an online study of the 
internet dictionary elexiko, for example, 155 of 685 participants stated that in an online 
dictionary they expected the meaning of a headword to be clarified through the use of 
one or more illustrations (see Klosa et al. 2014: 302). Fifty per cent, 344, of the 685 par- 
ticipants in this study normally expected internet dictionaries to contain multimedia 
elements such as illustrations (Klosa et al. 2014: 310). Questions, such as whether users 
look at both the definition of aheadword and the accompanying illustration, or whether 
they concentrate solely on the text or solely on the picture, could also be investigated. 
The initial results of an online study and an eye-tracking study on this topic, available 
in Kemmer (2014b), lead to the conclusion that in fact both the (definition) text and the 
illustrations are looked at. Lew and Doroszewska (2009: 2.48) report on the ‘low reten- 
tion rate associated with picture-only lookups, indicated by a usage study, when looking 
up lexical items denoting facial expressions and body reflexes. Retention rates in this 
study were higher, when text (Li translation) as well as an animated picture was looked 
up. The relationship between image and text in illustrated dictionaries (see Section 
32.2.1) should be all the more carefully conceived and implemented. 


32.3 ENCYCLOPAEDIC AND CULTURAL 
INFORMATION IN DICTIONARIES 


Encyclopaedic and cultural information in a dictionary can be given not only through 
illustrations, but also in text. This mainly applies to general monolingual dictionaries, 
but also, more rarely, to bilingual dictionaries or learners’ dictionaries!® (Svensén 2009: 
292ff.). Encyclopaedic information includes details and explanations about the item 
denoted by the headword (e.g. about the habitat of an animal, or how a tool is used). 
Encyclopaedic information is therefore principally to be found in reference works which 
relate to concrete items, such as encyclopaedias, but it is also contained (consciously or 
unconsciously) in language-related reference works such as dictionaries, especially with 
content words such as nouns and adjectives (much less so with function words such as 
pronouns and conjunctions). 

Cultural information includes, for instance, statements about what typically happens 
in the place denoted by the headword (e.g. restaurant, café, flat) in a particular cultural 
circle, that is, culture-specific contexts. In a dictionary, this can appear in the defini- 
tion, but also in the examples and collocators of the headword. Cultural information is 
therefore neither linguistic nor encyclopaedic in nature, but rather it describes the artis- 
tic, intellectual, academic, economic, and social aspects of a headword, which are, how- 
ever, not strictly necessary to explain its meaning. Cultural information also includes 


8 For the importance of encyclopaedic information in learners’ dictionaries, see Cowie (1983: 136ff.). 
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information about associations and connotations, which, as a result of certain cultural 
peculiarities, are associated with the use of a word.” 


32.3.1 Dictionary—Encyclopaedic 
Dictionary—Encyclopaedia 


Different dictionaries contain different amounts of encyclopaedic information, and it 
is possible to distinguish between different types of dictionary on this basis (Svensén 
2009: 290ff.): (i) the purely linguistic dictionary (‘language dictionary’); (ii) the dic- 
tionary with some encyclopaedic features; (iii) the so-called encyclopaedic dictionary 
(Hupka 1989b); and finally (iv) the encyclopaedia (‘specialist dictionary’). The bounda- 
ries between these types are fluid and, in practice, hybrid forms are common.”” 

This is most evident with the encyclopaedic dictionary, which combines features of 
pure language dictionaries and encyclopaedias: it treats the headwords of all parts of 
speech in the same way as a normal dictionary, but it treats proper names as would an 
encyclopaedia. Like encyclopaedias, encyclopaedic dictionaries contain more special- 
ist vocabulary than normal dictionaries and use illustrations more often than these, in 
order to illustrate the reality denoted by the headword. In an encyclopaedic dictionary, 
factual information is sometimes presented clearly separated from the language-related 
information (see Figures 32.2 and 32.3 in Section 32.3.2). 


32.3.2 Types of Encyclopaedic and Cultural Information 
in Dictionaries 


Encyclopaedic and cultural information can be realized in different types of dictionary 
information. In addition, at the level of the macrostructure, proper names as headwords 
are to be seen as encyclopaedic information, and this is found both in general language dic- 
tionaries as well as encyclopaedic dictionaries. Since proper names refer to people, places, 
institutions, historical events, titles of works of art, literature, etc. (Svensén 2009: 296), such 
headword entries are by their very nature always encyclopaedic. The meaning of a head- 
word which is a proper name cannot be explained by means ofa definition; only its function 
asa reference to and identification of a particular object in the real world can be described. 
Encyclopaedic and cultural information are often found in the context of explana- 
tions of meaning. This is because it is difficult to explain things which occur in the world, 


8 For the importance of the treatment of differences in cultural connotations in bilingual dictionaries, 
see Rey (1991). 

20 For more classification options, see Hupka (1989b: 988f.); for a discussion of the distinction 
between dictionaries, encyclopaedias, and hybrid types, see Lara (1989) and Peeters (2000). 

4 For a critical evaluation of encyclopaedic dictionaries and their marketing strategies, see Landau 
(2001: 151f.). 
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elexiko 
Dollar © 
Lesart: 'Wdahrungseinheit’ 
@ zur Ubersichtsseite 
Bedeutungs- | Semantische Typische Sinnverwandte Besonderheiten Grammatik 
erlauterung : Umgebung Verwendungen Worter des Gebrauchs 


a a . = . a, rere 


Bedeutungserlduterung 


i 
Mit Dollar bezeichnet man eine Wahrungseinheit, die in den USA, 
Kanada und weiteren Staaten verwendet wird. | 

j 


Belege anzeigen » 
Sachinformationen 


Weitere Informationen | 
Dollar: [aus deutsch »Taler«] der, Wahrungseinheit der USA: 1 | 
Dollar (US-$) = 100 Cents (c). Auch andere Lander bezeichnen 
ihre Wahrung als Dollar, z.B. Australien, Kanada, Neuseeland, i 
Simbabwe, Namibia, Brockhaus in Text und Bild Edition (2002). { 
Mannheim (CD-ROM). ! 


@ wortklasse: Individuativum 


FIGURE 32.2 Entry Dollar in elexiko 


which characterize it, which belong to it, etc., without any reference to this reality. And 
so it is not only the semantic features of the denotative meaning which are contained in 
the dictionary definitions, but also encyclopaedic features (Wiegand 1989: 55of.). Hence 
it is possible to either put the encyclopaedic information into additional comments 
(example 1) or include it directly in the formulation of the definition (example 2). 


(1)  structuralism 2.b. Linguistics. Applied to theories in which language is considered 
as a system or structure comprising elements at various phonological, grammati- 
cal, and semantic levels, esp. after the work of F. de Saussure (1857-1913). (OED 
online; 7 March 2012) 
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wine, 7.2 


View as: Outiine { Full entry 


Pronunciation: /wain/ 
Forms: OE-ME win, (ME uin), ME-15 wyn, ME-~15 (16 Sc.) wyne (ME wyin, vyn, ME wijn(e, ME, 1g .. i te: 
Etymology: Old English wén = Old Frisian, Old Saxon, Middle Low German, Middle Dutch wina. "0 | 


i. 


a. The fermented juice of the grape used as a beverage. 


It is essentially a dilute solution of alcohol, on the proportion of which in its composition depend its stimulating and intoxicating 
properties. Wines are classed as red or white, dry or sweet, still or sparkling. 


FIGURE 32.3 Entry wine in OED 


(2) Austenian. Of or pertaining to Jane Austen, novelist, 1775-1817, or her writings. 
(OED online; 7 March 2012) 


In the explanations of meanings of technical terms, it is almost impossible to distinguish 
between linguistic information and factual information (example 3). 


(3) eclipse. 1.a. Astron, An interception or obscuration of the light of the sun, moon, 
or other luminous body, by the intervention of some other body, either between 
it and the eye, or between the luminous body and that illuminated by it; as of the 
moon, by passing through the earth’s shadow; of the sun, by the moon coming 
between it and the observer; or of a satellite, by entering the shadow of its primary. 
(OED online; 7 March 2012) 


For particular headwords which are very culturally specific or which need to be 
explained against the background of a particular ideology, linguistic and factual fea- 
tures of meaning are hardly distinguishable as well, when definitions which can be easily 
understood and which convey the meaning of the headword correctly are formulated 
(example 4). 


(4) nirvana. 1.a. Buddhism. The realization of the non-existence of self, leading to ces- 
sation of all entanglement and attachment in life; the state of being released from 
the effects or karma and the cycle of death and rebirth. (OED online; 7 March 2012) 


In a number of dictionaries, especially encyclopaedic dictionaries, encyclopaedic addi- 
tions to the explanation of the meaning are indicated by headings (see Figure 32.2) or 
different fonts (see Figure 32.3); due to fewer space restrictions, this is easy to do in elec- 
tronic dictionaries. 

Cultural information in particular, but also encyclopaedic information, can appear 
with the collocators and examples of a headword (Hupka 1989c: 992). From the 
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wine [uncountable and countable] 
1 an alcoholic drink made from GRAPEs, or a type of 
this dink: 
4 @ glass of wine 
4 @ delicious Californian wine 
red/white wine 
§ @ bottle of red wine 
dry/sweet/sparkling wine 
« a dry white wine 


2 an alcoholic drink made from another fruit or plant 


FIGURE 32.4 Entry wine in LDOCEonline 


headword entry wine in LDOCEoniline (see Figure 32.4), for example, the following 
information about the object ‘wine’ is conveyed by the examples given and the adjectival 
collocators: wine (at least in the cultural circle in which the dictionary originated) is bot- 
tled and drunk from glasses, it is made in California and it is red or white, dry, sweet, or 
sparkling. 

A certain amount of information in the back- and front matter of printed dictionar- 
ies is by nature encyclopaedic. So, for example, American encyclopaedic dictionaries 
contain ‘nonlexical information, such as signs and symbols, locations of colleges and 
universities, maps, historical documents such as the Declaration of Independence or 
the Constitution of the United States, historical accounts, charts or surveys of language 
relationships, editorial advice and style sheets, and so on (Algeo 1990: 2004). 


32.3.3 Encyclopaedic and Cultural Information in Printed 
and Electronic Dictionaries—Examples and Analysis 


In the lexicographical traditions of different languages, the tradition of encyclopaedic 
information in general language dictionaries and in particular the tradition of ency- 
clopaedic dictionaries have developed to varying degrees. In the American diction- 
ary tradition, for example, encyclopaedic dictionaries play an important role (Algeo 
1990: 2004f.), and also in the French and Italian dictionary tradition there are early 
relevant examples, whereas there are none in German-language lexicography (Hupka 
1989b: 993ff.). Even today, there are still many French encyclopaedic dictionaries, for 
example Dictionnaire Hachette or the multi-volume Grand Larousse encyclopédique en 
dix volumes. Online, Larousse offers parallel searches in a dictionary and an encyclopae- 
dia at <www.larousse.fr>, which constitutes a continuation of the publisher's encyclo- 
paedic dictionaries. 

For English, there is, for instance, Longman Dictionary of English Language and 
Culture (LDELC1), which combines a ‘complete language dictionary with 15,000 cultural 
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and encyclopedic entries covering people, places, history, geography, the arts and popu- 
lar culture’ (LDELC: synopsis). LDOCEonline also contains encyclopaedic entries, and 
in AHD numerous names of people and places are listed as headwords. An example of 
an English-language dictionary portal, which brings together different linguistic and 
encyclopaedic reference works, is Bartleby Reference.” 

A look at several online dictionaries in other European languages shows how vari- 
ably the option of networking with encyclopaedic reference works is used: while word 
entries in elexiko are linked to other internet sites (e.g, in the entry Europa, there is a link 
to the European Union website, <http://europa.eu/index_de.htm>), there are no com- 
parable links in the Digitales Worterbuch der deutschen Sprache. The Danish internet 
dictionary Den Danske Ordborg likewise does not contain any links, while the Dutch 
dictionary Algemeen Nederlands Woordenboek provides a link to Wikipedia—the Free 
Encyclopedia in every headword. 

As well as encyclopaedic dictionaries and dictionary portals, learners’ dictiadartés 
also contain encyclopaedic and especially cultural information. Here, as with bilingual 
dictionaries, it is important for those users who are not familiar with the foreign lan- 
guage that the dictionary convey cultural facts of the foreign language community in 
addition to purely linguistic information. It is useful to include these in the dictionary, 
since learners of a foreign language otherwise often have no simple and direct access 
to this information (Cop 1989: 765). It is for this reason that there are culture-specific 
comments in learners’ dictionaries (e.g. entries on topics such as Gymnasium [‘gram- 
mar school’] and Weihnachten [‘Christmas’] in HWDaF). As well as this, information 
on geography and culture is included in the front- and back matter (e.g. diagrams show- 
ing the school system in Germany, Austria, and Switzerland, in the same dictionary), 
and relevant headwords are also included (e.g. Father Christmas and Santa Claus in 
LDOCEonline). Scenic plates, which are integrated into the printed dictionary (so-called 
middle matter), likewise often convey encyclopaedic and cultural information (for an 
example of this, see Section 32.2.4). 

In more recent bilingual dictionaries, for instance Hueber Learner’s Dictionary 
German-English/English-German (2009), similar ways of conveying factual and cul- 
tural information can be observed. In this dictionary, under the headword Sack, for 
example, after the linguistic information, the following explanation appears: ‘the “yel- 
low sack” is a special kind of rubbish bag [in Germany] in which recyclable packaging is 
collected’ (2009: 354). 

To summarize, there is a continuing tradition of encyclopaedic information in both 
printed andelectronic dictionaries. Particularly in the field of learner lexicography, such 
information is deliberately inserted in order to improve the usefulness of the dictionary. 
In the case of dictionaries on the internet, new ways of networking with encyclopaedic 
information (e.g. encyclopaedias) are also being tested. With that goes the development 
of new ways of presenting such information in both printed and electronic dictionaries. 


*2 For more on linking linguistic and factual lexicographical reference works in dictionary portals, see 
Engelberg and Miiller-Spitzer (2013). 
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32.3.4 Encyclopaedic and Cultural Information 
in Dictionary Corpora 


As mentioned in Section 32.3.2, cultural as well as encyclopaedic information can appear 
in the collocators and examples of a headword. This is especially so if the dictionary is a 
corpus-based dictionary. In the texts of a well-balanced dictionary corpus (see Kupietz, 
this volume), many facts which relate to daily life or particular fields of knowledge are 
mentioned naturally. In the dictionary, therefore, the boundaries between knowledge of 
a word and knowledge of the world become blurred (Wegner 1989: 894). The example 
Akte (‘file’) from the German-language online dictionary elexiko shows this clearly: sta- 
tistically significant noun collocators of Akte from the dictionary corpus are, for exam- 
ple, Brief (‘letter’), Dokument (‘document’), Gutachten (‘report’), Protokoll (‘minutes’), 
Schriftstiick (‘paper’), Unterlagen (‘documents’), Urkunde (‘document’), and also Behdrde 
(‘authority’), Bundesanwaltschaft (Supreme Prosecution Office’), Geheimdienst (‘intel- 
ligence service), Geheimpolizei (‘secret police’), Gericht (‘court’), Ministerium (‘minis- 
try’), Staatsanwaltschaft (‘Public Prosecution Service’), Verfassungsschutz (‘Office for 
the Protection of the Constitution’). From this it is possible to ascertain which types of 
document are contained in files, and that files are official documents, which are drawn 
up and kept by particular authorities. This is world knowledge rather than linguistic 
knowledge. 

Rich cultural information can also be obtained from corpus texts. The example Café 
from the German dictionary elexiko contains collocators extracted from the corpus 
which clearly show that cafés in the German or Central European cultural region are 
used not just for drinking and eating, but also as a place for events (for a flea market, a 
reading, as a gallery) or meeting point (for customers, including regulars) (Klosa and 
Storjohann 2011: 62). Cultural information is also conveyed in the dictionary entry via 
the naming of the dishes (Friihstiick [“breakfast’], Kuchen [‘cake’], Torte [‘fancy cake’]), 
and drinks (Café au Lait, Café Latte, Cappuccino, Espresso, Kaffee [‘coftee’], Latte mac- 
chiato, Milchkaffee [‘white coffee’}]) which are served in the café. 

Even if there is not as much space available in printed dictionaries as in online dic- 
tionaries, a corpus-based approach guarantees that important encyclopaedic and cul- 
tural information is also included in these dictionaries with reference to statistically 
significant collocators. Example (5) from LDOCEonline for the headword cafeteria 
shows this with the collocators factory and college.” 


(5) cafeteria: a restaurant, often in a factory, college etc, where you choose from foods 
that have already been cooked and carry your own food toa table 


3 See also the example wine in Section 32.3.2 from the same dictionary. 
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32.3.5 Usage Research on Encyclopaedic and Cultural 
Information in Dictionaries 


Up until now, hardly any research in the field of dictionary usage has been carried out 
or published specifically on the use or presentation of encyclopaedic and cultural infor- 
mation in dictionaries, In an online study of the German-language internet dictionary 
elexiko, 194 out of 685 participants expected the explanation of the meaning of a head- 
word to be supported by factual information (Klosa et al. 2014: 302). It would now be 
possible to examine more closely the extent to which definitions should be formulated 
purely linguistically or including encyclopaedic and cultural information, regardless 
of particular user groups and particular usage situations. Another interesting ques- 
tion would be whether, for the average dictionary user, the difference between linguis- 
tic knowledge and factual knowledge plays a role when choosing a reference work. 
The issue of how to present encyclopaedic information appropriately is important for 
both printed and electronic dictionaries, and should be investigated accordingly. In the 
online study of elexiko, 414 of the 685 participants generally expected internet diction- 
aries to contain links to other (online) reference works (Klosa et al. 2014: 310). It would 
now be possible to check more closely whether and how dictionary users actually use 
links to other online reference works. The results of such studies would be expected to 
show on the whole that encyclopaedic elements in (both printed and electronic) dic- 
tionaries and the presentation of these could be better designed, in order to improve 
overall usability. 


32.4 CONCLUSION AND FUTURE 
DEVELOPMENT 


There is a long tradition of illustrations and both encyclopaedic and cultural infor- 
mation in printed general language dictionaries, and also in learners’ dictionaries 
and some bilingual dictionaries. This tradition has been taken up and developed fur- 
ther in electronic dictionaries, especially on the internet. This is still at an experi- 
mental stage, for example the positioning of illustrations on the screen, how they 
are linked with the word entries, and how they can be used as a mode of access. The 
use of additional multimedia elements (e.g. videos, audio files) and links to exter- 
nal sources also needs to be developed. But even in the case of printed dictionaries, 
there are still conceptual challenges: the question of which factual information and 
how many and which illustrations belong in the dictionary has not yet been defini- 
tively answered and must be decided on again each time depending on the intended 
user group and function of the dictionary. In the last few years, there has been an 
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impetus, particularly from the field of learners’ dictionaries, to establish how both 
illustrations and encyclopaedic and cultural information can enrich a dictionary. So 
that lexicographical practice in this area can be further developed in a positive way 
in the next few years, sound criticism of practice up until now as well as thorough 
research into recently developed types of information and forms of presentation in 
both printed and electronic dictionaries are essential. 


CHAPTER 33 


Pree RPO CCPOPO SPECT rereerv err eerr revere ereree Cre erereerercerererr cer errr rier rererrerrerrirrcririeretrs: 


MAKING DECISIONS ABOUT 
INCLUSION AND EXCLUSION 


TORO OE OOOO EEE EEO O RE AEE EEE OSES EDENEE SENS EREEEREE THERESE DONE EOE ORES SHOE EOL EO OOH ONE HE REH EEE TEER ESD 


GRAEME DIAMOND 


33.1 INTRODUCTION 


Tuts chapter outlines the criteria used in assessing potential inclusions in the Oxford 
English Dictionary (OED), and summarizes the resources used in applying these criteria. 
While the focus of the chapter is the policy in use at the OED, where the range of vari- 
ables to be considered in making a decision on inclusion or exclusion is at its greatest, 
reference is also made to how decisions for a dictionary of current English may differ. 

AllEnglish dictionaries, regardless of size and purpose, tend to have at their heart vir- 
tually the same core vocabulary: words and meanings which are inarguably part of the 
language, the everyday materials from which speakers construct meaningful discourse. 
Such vocabulary will have a place in every dictionary, from the most compact learner's 
dictionary to a large historical dictionary such as the OED. 

The number of words which can expect such unanimous approval for inclusion is 
perhaps surprisingly small: analysis of the Oxford English Corpus (OEC) suggests that 
1,000 lemmas account for 75 per cent of the written English contained in the corpus;! 
the Oxford Advanced Learner's Dictionary identifies 3,000 items of vocabulary which 
a prospective English-speaker should prioritize in order to understand and be under- 
stood effectively.” Even the smallest dictionary therefore prompts questions of what, 
outside of such indisputably deserving candidates, should be included: the Oxford Mini 
Dictionary, for example, contains 90,000 entries; even Merriam- Webster's Dictionary for 
Children has over 36,000. 

For ongoing dictionary projects, of course, a headword list is already established, to 
which additions are made over time to reflect language change; in the case of most print 


) <http://oxforddictionaries.com/words/the-oec-facts-about-the-language> (accessed 18 April 2012). 
2 <http://oald8.oxfordlearnersdictionaries.com/oxford3000/> (accessed 18 April 2012). 
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dictionaries, corresponding decisions are also made on what to omit to make room for such 
new arrivals. Decisions in this context therefore revolve around which new (or previously 
overlooked) items of vocabulary deserve coverage, and (now less frequently, as dictionaries 
generally move towards being primarily an online resource, without the space limitations of 
print) also around identifying obsolescent or less relevant entries which can be removed. In 
either case the same factors and considerations must be taken into account, and would be 
equally applicable to the creation ofa new dictionary from scratch. 

For most modern dictionaries of current English, however many entries are included, 
typically the headword list is primarily shaped by corpus data. Interrogation of corpora, 
whether already existing or constructed especially for the dictionary in question (see 
Section 33.5) can give quantitative information on current frequency, which is used 
in guiding (or driving) decisions on whether a word should be included (see Section 
33-6.1.2). This synchronic approach is well suited to dictionaries describing the language 
as it is used now, but methods of gathering historical data are also needed to formulate a 
coherent inclusion policy for a historical dictionary. 

The principles outlined in this chapter reflect the criteria currently in use when 
deciding whether to include an item in the third, online, edition of the Oxford English 
Dictionary (OED3). The size and scope of this historical dictionary means the range of 
variables to be considered in making a decision whether to include or exclude a lexical 
item is at its greatest: old and obsolete terms with no place in a dictionary of current 
English are considered alongside the most recent buzzwords, and considerations such 
as etymological importance or significance in the overall development of the language 
are at their most complex and acute. These principles are, however, broadly applicable to 
any general dictionary, although specifics of quantity and degree may vary. 


33.2 COVERAGE IN THE OED: OVERVIEW 


As the Preface to the third edition of the OED makes clear, it is a myth ‘that it includes 
every word, and every meaning of every word, which has ever formed part of the English 
language. Such an objective could never be fully achieved. The present revision gives the 
editors the opportunity to add many terms which have been overlooked in the past, but 
it should be understood that fully comprehensive coverage of all elements of the lan- 
guage is a chimera. The aim, rather, is to be ‘comprehensive within reasonable bounds.* 


* Some recently created online dictionaries such as Urban Dictionary (<http://www.urbandictionary. 
com/>) do not have inclusion criteria as such, and are open to all users to contribute entries without 
restriction. Although the results of such endeavours are often lively and entertaining, they frequently 
consist largely of words which are virtually never used in the normal course of English usage (see Section 
33.3.3). Some others which are open to user contributions, such as Wiktionary, nevertheless operate using 
criteria very similar in character to those outlined in this chapter (see <http://en.wiktionary.org/wiki/ 
Wiktionar y:Criteria_for_inclusion>). 

4 <http://www.oed.com/page/oed3preface/Preface+to+the+Third+Edition+of+the+OED> (accessed 
18 April 2012). 
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‘Reasonable; is, of course, a generously elastic term, and to get a more precise idea of 
the challenges faced in deciding what to add to a dictionary of this size, it is worth taking a 
look at some statistics. In the publishing year June 2011 to March 2012, encompassing four 
quarterly updates to OED Online, 6,125 new items were added to the existing content of the 
dictionary. (‘Item’ in this context means any new word, new meaning of an existing word, 
or new compound or phrase listed under an existing headword.) For these four updates, 
a total of 8,453 items were considered for inclusion: just over 72 per cent of the items sug- 
gested for inclusion during work on these four updates were drafted and now appear in 
OED Online.> The other 28 per cent, or 2,328, ‘rejected’ suggestions are not consigned to 
oblivion: they are kept on file for monitoring in the future, but it is likely to be some time 
before their case is reassessed, unless a relevant development in current English prompts 
a re-examination of related vocabulary. This was the case with the adjective subprime. 
A financial meaning, designating a lending rate lower than the prime rate, and typically 
offered only to the most desirable borrowers, was not common or widespread enough to 
be included in the second edition of the dictionary (OED2) or the Additions Series, but is 
now included in OED3 after the advent of a newer sense, designating a loan made (often on 
unfavourable terms) to a borrower who does not qualify for other loans because of a poor 
credit history or other circumstance. The older sense, although no more common than it 
was twenty years ago, now provides interesting and valuable historical background to the 
newer—the two senses are of almost opposite meaning, 

The sources of these suggestions are varied: while, as noted in Section 33.1, many 
dictionaries of current English rely primarily or solely on corpora to generate lists 
of candidates for inclusion, and the OED makes much use of the OEC in this regard, 
we also systematically analyse the OED’s own Incomings database (see Section 33.4) 
and other dictionaries. Alongside this, the depth of research and analysis involved in 
revising OED2 gives unparalleled insight into words and meanings from all historical 
periods which are not currently covered by OED, anda large proportion of the sugges- 
tions are generated by this means. 

Some reasons for the rejection of suggestions are straightforward: the suggestion 
may have been made speculatively or mistakenly without noticing the item in question 
was already covered by the dictionary, for instance, or have been based on a misun- 
derstanding or misreading of the evidence supposedly supporting its inclusion. In the 
vast majority of cases, however, a rejected suggestion will have been a valid request to 
investigate a vocabulary item with a genuine history of use (of whatever length) among 
some proportion of English speakers (however large or small), which is not covered by 
the OED. Without limits on time or on numbers of editorial staff, entries could in fact 
have been created for such items: a definition could be given, and supporting evidence 
(however scant) given in the form of a sequence of illustrative quotations. Against the 
background of so many other competing suggestions, however, a system for prioritizing 
the candidates for inclusion which would most enrich the dictionary’s content and be of 
most utility to the user is necessary. 


5 Source for these figures: internal OED records of the suggestions made and the action taken for 
each one. 


MAKING DECISIONS ABOUT INCLUSION AND EXCLUSION 535 


33.3 INCLUSION CRITERIA 


This section gives a summary of the general principles and criteria underlying each indi- 
vidual decision to include or exclude a particular item. For convenience, the term ‘word’ 
or ‘item’ will be used in this chapter, although as previously stated, the task of adding 
‘new words’ to the OED is not limited to very recent coinages requiring a brand new 
headword entry in the dictionary: a ‘new item’ is any word, meaning, compound, or 
phrase which was not previously covered by OED2 or the three supplementary Additions 
volumes, whether that be, say, a fifteenth-century term which has only now come to 
light or a new formation relating to a recent technological development. In dealing with 
such a variety of material, the OED’s inclusion policy needs to be flexible, but whatever 
the nature of the particular case, any decision on inclusion will be made with reference 
to these same criteria. At the same time, no hard and fast rules are outlined in this sec- 
tion, nor would it be helpful to do so. For every item a distinct set of factors will be in 
operation, anda decision will be based on an assessment of them all in combination. For 
numerical criteria, approximate figures indicating what might be expected for a sug- 
gested item to be included are given, but these are intended as guidelines only. 


33.3.1 Evidence 


The OED’s principal aim is to describe the English language as it is (or was) used. The 
primary concern is therefore to determine whether there is a convincing body of evi- 
dence that a word has ever actually been in use. In practice, this means that several 
examples of the word being used in a variety of sources in written (preferably published) 
English are required, so that the evidence on which the decision is based (some of which 
will get used in the OED3 quotation paragraph) is verifiable by independent reference in 
the future. Due to the different types of vocabulary under consideration, there is no rigid 
numerical threshold for the number of examples needed. However, in an age of prolifer- 
ating online resources, making available to the lexicographer texts from many different 
chronological periods, geographical regions, and linguistic registers (see Section 33.4), 
and in the absence of a supporting reason for inclusion stemming from one or more 
of the other factors listed in the criteria in this section, at least 100 separate, independ- 
ent uses of a word would be required to consider inclusion justifiable on the basis of 
frequency of use. This contrasts sharply with a rule of thumb from the pre-Internet age 
(still occasionally referred to as current OED policy®), when the range of sources from 
which evidence could be drawn was far smaller, of five examples of use in five different 
sources, over a period of at least five years. 


6 <http://www.macmillandictionaries.com/MED-Magazine/May2006/38-New-Word.htm#2> 
(accessed 19 April 2012). 
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33.3.2 Longevity 


Access to a far wider range of resources has also led to modification of the amount 
of time a word usually needs to have been in use before it is considered for inclusion, _ 
although to a lesser degree: in general a minimum requirement would be that a word has 
alifespan of at least ten years, so that, in the case of a recent coinage, the entry may bring 
the necessary perspective to bear in order to make an authoritative judgement possi- 
ble on such aspects as figurative usage, grammatical changes (such as a noun begin- 
ning to be used as a verb), and so on. Additionally, no term is ever removed from the 
OED, so time is needed to determine if a term is very ephemeral, engendered by a single, 
soon-to-be-forgotten event, and never occupying a permanent place in the lexicon. This 
may perhaps seem unnecessarily to delay coverage of very recent linguistic develop- 
ments, although the research undertaken in the course of assessing an item for inclusion 
quite routinely uncovers evidence of a greater vintage than one might first expect: the 
OED’s entry for OMG (‘oh my God"), for example, a term much associated with infor- 
mal online communication, dates the term back to 1917, in a personal letter (to Winston 
Churchill), and even the first quotation in an electronic context comes from as long 
ago as 1994; credit crunch, a term which feels strongly associated with the latter half of 
the first decade of the twenty-first century, in fact dates from 1966. In addition, as ever, 
exceptions can be made for compelling cases of ubiquity or cultural significance: pod- 
cast, which from its beginnings in 2004 quickly became a standard item of vocabulary, 
first appeared in OED3 in 2008. Even if the term later becomes dated, the entry can be 
adjusted accordingly to reflect this; there is no doubt that it deserves its place in a histor- 
ical record of English vocabulary. Equally, older terms which did not last ten years may 
also be included if they are of historical or linguistic importance (most commonly this 
is applied to the very first appearance of an established English word, even if this is an 
isolated example), or if they shed light on an aspect of the development of the language 
(arare term which is the etymological source of later, much more well known word, for 
example, or a short-lived piece of vocabulary relating to outdated technology which was 
nevertheless influential on later developments). 

At the other end of the spectrum, words with a particularly long life (typically 100 or 
more years) will tend to be prioritized for inclusion even if evidence of frequent use is 
somewhat lacking; the lifespan alone suggests a relatively important word which has 
proved useful to speakers and writers of English over a considerable period of time. The 
evidence would, however, need to indicate some continuity of usage; two isolated usages 
more than a hundred years apart do not tell us the same about the importance ofa word 
to the language as low-level, but continued, usage, over the same period. 


33.3.3 Naturalization 


To be entered in the OED, a word needs to have reached a level of general currency where 
it is unselfconsciously used with the expectation of being understood: that is, we look 
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for examples of uses of a word that are not immediately followed by an explanation of its 
meaning for the benefit of the reader, This would apply equally to loanwords from other 
languages and new English formations unfamiliar to a general English-speaking read- 
ership. An example of the latter is seagull manager, a humorous term for a particularly 
annoying type of office boss, which appears in print at least as early as 1988 but never with- 
out bracketed or glossarial explanatory text containing a variation on ‘a boss who flies in, 
makes a Jot of noise, craps on everything and leaves’ natural, unselfconscious usage in 
which people actually refer to others as seagull managers is virtually non-existent. 

Abbreviations, especially of the names of organizations, projects, and so on, also tend 
to find it harder to meet this criterion. Aside from very familiar abbreviations (OMG, 
NASCAR, 4WD, etc.), such items are usually accompanied by an explanatory expan- 
sion to the full form in question when appearing for the first time in any given piece of 
writing. 


33.3.4 Lexicographical Significance 


A reasonably substantial number of items are included less because of their own claims 
to widespread, long-standing, naturalized use, and more because of what they contrib- 
ute either to our understanding of a particular word, or of the language as a whole, or to 
the cohesion and utility of the dictionary’s coverage. As noted at the start of this section, 
the OED seeks to include the first use in English of any word already covered by the dic- 
tionary, as an important matter of historical record, and regardless of how badly attested 
it is. The revised OED3 entry for the adjective designing, for example, now includes a 
new, obsolete, first sense supported by a single piece of evidence from 1614, and relating 
to the verb design in the sense ‘designate, mark rather than the more usual ‘craft, plan, 
scheme, as shown in the later, much more common senses (‘designing men in positions 
of power; ‘a designing mind, etc.), This minor sense nevertheless illustrates an interest- 
ing point about the false starts a word may make before becoming established in the 
language. Relatedly, this principle can extend to providing information about a family of 
words: to take a simple example, an entry for the poorly attested noun locomoving (first 
quotation 1704) was included because it pre-dated OED2’s existing entry for the more 
common (though still not hugely familiar to most) verb locomove (1792), and provides 
the earliest recorded evidence for the set of formations deriving from loco- and the verb 
move and its derivatives, which in turn antedate the verb locomote (1831). In this way the 
inclusion of a term which is really quite rare (appropriately labelled as such) can shed light 
on the development of a whole group of etymologically or semantically associated words. 


33.3.5 Other Favourable Factors 


Alongside these relatively discrete and measurable characteristics, there may be other 
elements to take into account when assessing a word's suitability for inclusion, typically 
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by applying other criteria more flexibly in recognition of mitigating factors relating to 
the type of vocabulary to which the word belongs. A piece of teen slang or coarse slang, 
for example, or something from other registers of English which are poorly represented 
in print and published sources, would require fewer examples to meet the criterion for 
evidence of use. Similarly, items dating from before 1800—a period from which there 
is simply less surviving written material to survey—but which die out before the nine- 
teenth century, where the documentary record available to editors becomes much 
richer, would have less to do to demonstrate satisfactory levels of evidence for inclusion. 

Other such factors relate more to lexicographical significance and OED2’s existing 
coverage than to the type of item in question, and particularly to items which com- 
plete a story only partially told by OED2. For example, less evidence would be required 
(although still enough to make a convincing case for reasonably sustained and wide- 
spread use) to include a current sense of a word which OED2 currently describes as 
obsolete, to avoid giving a misleading picture of the word’s status; or for one which 
has superseded, or was superseded by, a word already in OEDa; or for a word which 
remedies an oversight, such as one used as defining vocabulary by OED2 but not itself 
covered by it. 


33.3.6 Exclusions 


Typically, behind the decisions made on inclusion by any particular dictionary lie a less 
evaluative set of conditions: particular categories of item which the dictionary in ques- 
tion does not undertake to cover. This may be as simple (and logical) as a dictionary of 
current English excluding obsolete or archaic words, ora dictionary of specialist terms 
(see Becker, in this volume) considering only items relating to the field, or regional vari- 
ety of the language, with which it deals. In other cases more complex concerns must 
be addressed, such as whether to include taboo terms (a particular issue for children’s 
and schools dictionaries, but also for the first edition of the OED and its supplements; 
see Mugglestone, this volume). Other considerations may be more lexicographical than 
cultural, such as the level of treatment given to expressions consisting of more than one 
word,’ especially if of transparent meaning; this might apply, for instance, to transparent 
compounds (e.g. tree branch), idiomatic phrases of readily discernible meaning (to play 
one’s part), or phrasal verbs which operate as a semantic unit (to throw up). 

However, only a very specific type of item is routinely excluded from OED3: unlike 
some other dictionaries (including the Oxford Dictionary of English, ODE), entries for 
names of people, places, and so on are not included, unless the name has acquired an 
extended or allusive sense. So the only sense of the noun America covered is an allusive 
one denoting an idealized destination; information about the land mass is given in the 
etymology of the entry. 


7 See discussion of ‘multiword expressions’ and the wider survey of decisions on which categories of 
item to include in Atkins and Rundell (2008: 178-89). 
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33-4 RESOURCES USED FOR ASSESSMENT 
OF SUGGESTIONS 


Of course knowing what the criteria are is one thing: selecting the right tools to measure 
an item against them is equally important. All of the criteria above are predicated on 
an assessment of what the evidence tells us about a word’s frequency, lifespan, level of 
general currency, and status relative to other items already covered by the dictionary. 
But how does one come by such evidence? Where should one look to gather information 
about English usage from the eleventh century to the twenty-first, in different parts of 
the world, and in different types of speech and writing? 

There is no single answer, of course, at least for the modern OED. Electronic sources, 
and in particular, searchable commercial Internet archives of printed texts, from all 
periods of English usage, have vastly expanded what is available over the past three dec- 
ades, but for more than a century the principal means of assessing the evidence support- 
ing a word’ case for inclusion was the dictionary’s own extensive files of quotation slips, 
alphabetically filed according to the keyword to which they relate. The text on each slip 
typically resembles an illustrative quotation from an OED quotation paragraph; a use of 
the word in question, in enough context to give an idea of its meaning, and sometimes 
also extra information such as a suggested definition or etymology. Many were submit- 
ted by paidor voluntary readers on the dictionary’s Reading Programmes; others by dic- 
tionary editors themselves, and by interested members of the public. Scrutiny of these 
slips was for a long time the only way of determining how frequently used or long-lived 
a term was, and they still play a significant, if diminished, role in the assessment process: 
the texts selected for reading by the dictionary’s Reading Programmes, in the past and 
now, are often of a type not available from commercial Internet databases—television 
and film scripts, specialist magazines, and so on. 

Since the start of the 1990s, quotations gathered by the Reading Programmes have 
chiefly been keyed and stored electronically in the Incomings database. This database, 
now containing over 3 million quotations, combines the benefits of the slips—index- 
ing by keyword, the variety and obscurity of some of the sources covered—with the 
advantages of electronic searchability. The entire database, or a subset of it defined by 
date of quotation, region of origin, etc., can be interrogated using sophisticated search 
strategies. 

The Oxford English Corpus is the primary engine by which new material is identified 
and assessed for current Oxford dictionaries such as the Oxford Dictionary of English 
(ODE) or its online equivalent Oxford Dictionaries Online (ODO). It contains over 2 
billion words of English, dating from 2000 onwards and culled mainly from Internet 
sources, and gives a balanced and accurate picture of current English usage. 

For items belonging to specific varieties of English, other historical dictionaries often 
provide a basis for assessing the evidence available. Most are now available to the OED in 
searchable electronic format, either on the Internet, like the Middle English Dictionary, 
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Dictionary of the Older Scottish Tongue, and Jonathon Green's Dictionary of Slang, or 
as full-text databases available in-house to OED staff, such as the Dictionary of South 
African English on Historical Principles and the Historical Dictionary of American Slang. 

While less focused than region- or register-specific historical dictionaries, commer- 
cially available Internet databases often also display a concentration ona particular type 
of English in the sources that they make available. It is necessary to consult several to get 
an overall picture of a word’s frequency, distribution, and lifespan. Some of those most 
frequently used in assessing an item for inclusion in the OED are listed below, together 
with a brief summary of their content. (Typically a subscription is required for access to 
these databases, with the notable exception of Google Books.) 


Early English Books Online (EEBO), a digital collection of facsimiles of nearly all 
works in English published between 1473 and 1700, although only some are search- 
able as full texts. 

Eighteenth Century Collections Online (ECCO), containing over 180,000 fully 
searchable titles from the eighteenth century. 

Google Books, a vast repository of millions of scanned print titles from the fifteenth 
century onwards. Not all of these are searchable, however, and fewer still are view- 
able as full-text facsimiles, depending on copyright issues. 

Nexis, including electronic-text equivalents (rather than scanned facsimiles) of 
over 23,000 newspaper and other periodical sources from the 1970s onwards. Over 
time transcripts of television news shows and content from selected blogs and 
other web pages has also been added. 

Digitized collections of facsimile historical newspapers, such as Proquest Historical 
Newspapers (British and American newspapers from 1791 to the present day) and 
Gale News Vault (chiefly British newspapers from 1604 to the present day, with 
some nineteenth-century American content). 

JSTOR, a collection of digitized academic journals from 1665 to the twenty-first 
century. 


This summary is not exhaustive, but provides an insight into the variety of sources which 
must be consulted to get a full idea of an item's history of usage. Each type of source has 
drawbacks, aside from any limits on the type of text represented. For Internet databases, 
problems arise such as the unreliability of the results given when Optical Character 
Recognition technology is used to produce searchable full-text versions of scanned doc- 
uments (mis-hits for a search term are frequent on ECCO, Google Books, and digitized 
newspapers, for example), so that the number of results returned for a search may sig- 
nificantly undercount or overcount the actual numbers present. Overcounting can also 
result when the level of syndication of a single newspaper article artificially inflates the 
results totals given by newspaper databases such as Nexis. Similar caution is needed in 
assessing the evidence presented by other historical dictionaries, which might unnatu- 
rally flatten the picture: for example, one item may occur only five times in English in 
all, and for completeness, the dictionary might include all the evidence it has found, 
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while in contrast, for an extremely well-evidenced item, quotations would be selected 
from the mass of those available—in all probability this will result in an entry with 
around five quotations, which looks very similar to the much rarer item. The OED's 
own Incomings database and slips files, owing to the circumstances of their creation, are 
the most reliable sources, targeted precisely at the needs of the OED and other Oxford 
dictionaries: Incomings alone represents the accumulation of over three million indi- 
vidual decisions by readers trained as to what would be of value to the dictionary. But it 
is dwarfed in terms of the number of words it contains by even the most modest of the 
commercial Internet databases, and it by no means forms a balanced corpus of natural 
English usage, biased by its very nature towards content which struck a reader as unu- 
sual or noteworthy. 

Nevertheless, as long as such caveats are borne in mind, by using many resources 
together, or if there are practical limitations on the resources available, by making judi- 
cious and intelligent use of them (see Brinton, this volume), the history of usage of a 
particular item can be pieced together accurately. 


33.5 CASE STUDIES 


The best way to illuminate the rather abstract summaries given in the foregoing sections 
is by specific reference to particular cases. In this section three different types of vocabu- 
lary item will be examined against the criteria given in Section 33.3 by using some of the 
resources mentioned in Section 33.4, together with conclusions on whether such an item 
would then be included in the OED, and a brief commentary on how decisions made by 
a dictionary of current English might differ. 


33.5.1 Super PAC 


Rising to prominence in US politics in 2010, in the wake of a Supreme Court rul- 
ing which lifted restrictions on independent political spending by corporations and 
unions, super PAC denotes a political action committee, or PAC, which is subject to 
fewer restrictions on certain types of fund-raising or expenditure than an ordinary 
PAC. The 2010 date, if genuinely the first appearance of the term, seems to argue against 
inclusion in the OED until the term is more established (see Section 33.3.2), and the 
OED’s Incomings database appears to bear out the idea of a late date: it contains five 
uses (from the USA and the UK) from 2011 and 2012. On this basis alone, the term 
would need to demonstrate some staying power before it would be included in the 
OED (although such a result, alongside just over 300 occurrences in the OEC, would 
be enough to suggest that inclusion in ODO or another dictionary of current English 
would be appropriate); other resources would need to support its case, especially in 
terms of longevity. 
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Turning to the largest single source of data currently available, a simple search for 
‘super PAC’ (in singular and plural) on Google Books gives 1,300 results,’ on the face of 
it a significant level of frequency (see Section 33.3.1). Even a cursory examination of the 
results shows this is not a wholly accurate count, however: the first results page alone 
reveals hits for, among other things, the computer game ‘Super Pac-Man’ The majority 
of the results are correct, though, and it is a fair assumption (borne out by further, more 
detailed checking) that there are more than 100 separate independent uses of the correct 
sense of super PAC represented on Google Books. 

By using the ‘custom date range’ function on Google Books we can further investigate 
the age of the term. Were there preliminary uses of the term before it shot to prominence 
in 2010? That this is the case quickly becomes apparent. Although the date attributed to 
its results by Google Books is not always reliable (later, revised editions of a work may 
bear the date of the first edition, for example, or an issue of a journal may bear the date 
of that journal's first ever issue), it does not take long to locate an example from 1984 
in a text which is viewable (and verifiable) as a full-text facsimile, plus other potential 
examples, viewable only as ‘snippets’ or small chunks of text, from as early as 1980. For 
the purposes of an OED entry such ‘snippets’ would require independent verification 
by checking in a library; one cannot be sure that the text displayed in a Google Books 
snippet truly belongs to the text it claims to. Regardless of this, it is clear that the term is 
ofa greater age than was first suspected, and quick reference to the Nexis database con- 
firms this: it contains a 1982 use from the US magazine Newsweek. Nexis, by the nature 
of the sources it makes available (newspapers, blogs, TV transcripts, and so on) is, like 
the OEC, an excellent indicator of a term's penetration into normal, natural English 
intended for a general audience. If a term appears in significant numbers on Nexis 
(without accompanying explanations of its meaning), it is a strong sign that a ‘typical 
English-speaking reader is expected to view this as an unremarkable item of vocabulary, 
the meaning of which is well known. Alongside the early date, therefore, Nexis can pro- 
vide confirmation of the term’s naturalization in the language. Even the earliest, 1982, 
instance is not directly glossed, although it is explained by its context, and there are over 
3,000 other uses of the word (mostly correct, rather than references to Pac-Man), argu- 
ing persuasively for inclusion in the OED on the basis not only of widespread use but also 
sufficient longevity and unselfconscious usage. 


33.5.2 Hashtag 


Hashtag is used to denote the hash or # symbol in the context of its use as a marker of sig- 
nificant keywords or topics in a message (or tweet) posted on the social networking service 
Twitter, and also on messages in some other discussion forums, and by extension for the 
word or phrase marked in this way. Hashtags are used to categorize messages on the service, 


® Figures given in this section were correct as of 21 April 2012. 


MAKING DECISIONS ABOUT INCLUSION AND EXCLUSION 543 


and those that are much used in a short space of time are identified as trending as popular 
talking points. 

As even the short paragraph above demonstrates, Twitter brings with it a whole vocab- 
ulary relating to its use. The service is much used by, among others, media and PR pro- 
fessionals, which has resulted in perhaps greater exposure in traditional media for terms 
relating to Twitter than for many other Internet-based phenomena. Checking the OED'’s 
Incomings (for solid, hyphenated, and two-word forms) backs this up: there are twenty-one 
quotations for hashtag, ranging in date from 2009 to 2012. Nearly all are from newspapers 
or magazines. The OEC, which is primarily constructed from web-based sources, gives fur- 
ther support: there are 258 occurrences there, and a further 900 on OEC’s supplementary 
corpus of material drawn from Twitter itself. 

It seems safe, then, to conclude that the term forms a part of current English, and as 
one might expect, all the terms mentioned above (hashtag itself, tweet, trend) are covered 
by ODO. The question in terms of inclusion in OED centres more on the word’s lifespan. 
‘Twitter was only launched publicly in 2006 (as “Twttr’),’ and the first use ofa hashtag on 
the service was in 2007!" Both the service and the word hashtag took some time to come 
to more general notice, as the earliest date of 2009 on Incomings indicates (even in the web 
sources represented on OEC, only two uses date from 2008; the rest are from later). It is also 
noteworthy that all of the 2009 quotations on Incomings, and many of those from 2010, 
are glossed examples explaining what a hashtag is: even as recently as this, writers could 
not expect a general readership to understand the term. As might be predicted with a term 
closely tied to a particular technological development, there is no preliminary use before 
gaining wider prominence as there was with super PAC. Wider searching on Nexis and 
other newspaper databases turns up nothing earlier than 2007, and earlier use of the ‘hash 
tag’ on Google Books and Googles archive of Usenet newsgroups shows an attributive use of 
the noun hash in a technical computing context, relating to the application of an algorithm 
to a string of bytes, which is already covered by OED" and is not relevant to assessment of 
the use of the word to refer toa marker of key words or phrases on Twitter. 

So despite our having very strong evidence of widespread use in the past few years, 
the term has a lifespan of less than ten years, and until very recently tended to be used in 
general sources with an accompanying explanation of its meaning. It is arguable, as with 
the case of podcast (see Section 33.3.2), that hashtag should nevertheless be included in 
OED, but the word really does seem a little young for us to be writing its biography at 
this stage, and on a number of counts it seems advisable to wait until there isa little more 
information to analyse: hashtag is currently overwhelmingly associated with one par- 
ticular social networking service; will this remain the case, or will it become a more gen- 
eral term used in other contexts? Will Twitter itself maintain its high-profile position 


9 <http://techcrunch.com/2006/07/15/is-twttr-interesting/ (accessed 22 April 2012). 

10 <http://www.readwriteweb.com/archives/the_first_hashtag_ever_tweeted_on_twitter_-_they_s. 
php, http://en.wikipedia.org/wiki/Hashtag#Origin> (accessed 22 April 2012). 

1. Oxford English Dictionary Additions Series, ed. John A. Simpson and Edmund $. C. Weiner, Volume 
2 (1993: 141~2); <http://www.oed.com/view/Entry/84445> (accessed 22 April 2012). 
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in the media landscape, or fade as others such as MySpace and Friends Reunited have? 
There is already some evidence on Nexis (from 2009 onwards) for an incipient verbal 
use (“The revolution will be hashtagged!’), and for an adjective hashtagged: are these and 
other grammatical developments here to stay? Will figurative usage emerge? Because of 
OED’s commitment to providing a full historical picture, it might be better to wait a little 
longer to have firmer answers to such questions: in the meantime Oxford dictionaries’ 
coverage of the term is provided by its online dictionaries of current English on ODO.” 


33.5.3 Advisorial 


Unlike the previous two examples, this is not a term which has come to prominence 
recently. In fact at no point in its history has it been particularly common. The word 
advisory can perform most of the same functions, andis the much more usual choice for 
English speakers. It might be argued that advisorial may have a more specific nuance, 
‘of or relating to an advisor} as against advisory’s more general ‘having the function of 
or responsibility for giving advice; characterized by the giving of advice, but in practice 
usage does not really bear this out: advisorial usually qualifies words such as ‘role, ‘capac- 
ity, ‘body; and so on, where advisory would serve perfectly well. This apparent redun- 
dancy is of course not a bar to inclusion in the OED: there is an entry for historial, for 
instance, despite its equivalence to historic and historical (themselves near-synonyms in 
some contexts). The decision whether to include will be based, as ever, on examination 
of the evidence. As might be expected, this is less plentiful than the evidence for super 
PAC or hashtag. There are no instances of the term on either Incomings or the OEC. 
But there are forty-nine on Nexis, dating back to 1987, fifty-three on Proquest Historical 
Newspapers, including one use from 1933, and nineteen on JSTOR, the first of which is 
from 1926. Turning to Google Books, the numbers begin to look slightly more impres- 
sive: a search turns up 389 results, with the earliest being 1877. EEBO and ECCO turn up 
nothing earlier than this, but this does represent a lifespan of over 100 years (see Section 
33.3.2), from the late nineteenth century to the present day. Without being startlingly 
common at any stage, advisorial has enjoyed a relatively long period of steady, continu- 
ous usage. In terms of potential inclusion in the OED, this is a good example of a valid 
but not compelling candidate word. A good entry could be prepared for it, with a solid 
sequence of illustrative quotations available. The issue is really a pragmatic one of availa- 
ble time and resources: in practice many other, stronger candidates would be prioritized 
over advisorial, and it might be some time before the editorial time necessary to draft an 
entry for it could be spared. 


2 In fact, the situation even in historical lexicography can change quite rapidly. OED’s editors have 
looked again at hashtag while this chapter has been in preparation, and an entry has now been added to 
the dictionary. 
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An entry for advisorial would be unlikely to be included in ODO: the lack of recent 
evidence on Incomings and OEC, together with a relatively weak showing on Nexis, 
would argue against its inclusion in a standard dictionary of current English. 


33.6 CONCLUSION 


‘The case studies in Section 33.5— with regard to OED inclusion, effectively a ‘yes, a ‘not 
quite yet, and a ‘maybe’—are intended to exemplify many of the issues which one must 
consider when measuring a candidate item against the inclusion criteria using the 
resources described in this chapter. Any item assessed for inclusion will present its own 
peculiarities, and a flexible response to the specifics of the history of a particular word 
is necessary in order to make an objective yes-or-no, black-or-white decision in what 
is essentially one big grey area. Nevertheless, without having anything as reductive as 
a one-size-fits-all method for all candidates for inclusion, it is possible to make general 
points about the characteristics one would expect an item to have to merit inclusion in 
the OED. The aim of an inclusion policy is, of course, to meet user needs and expecta- 
tions, so that a person who consults the revised OED is presented with the full story of 
the historical development of a word, and is not disappointed to find that common or 
familiar items of vocabulary are omitted. 

While the survey above focuses on the methodology of the OED, all dictionaries face 
the same sorts of questions when deciding what to include or exclude, and whatever the 
methods employed, wil] need to address issues such as: 


« Pressures of space (especially for print rather than online dictionaries) and edito- 
rial resources (Section 33.2). 

¢ The frequency or spread of usage expected to qualify for inclusion (Section 33.3.1). 

e Whether there is an expected length of time for an item to have been in use (Section 
33.3.2). 

« The level of naturalization expected before a loanword or other item would be 
included (Section 33.3.3). 

¢ Whether factors such as what a dictionary already includes should influence deci- 
sions on inclusion, regardless of other factors such as frequency (Sections 33.3.4, 
33.3.5). 

* Whether to include particular types of vocabulary, e.g. taboo words, obsolete items, 
specialist vocabulary, collocations of transparent meaning (Section 33.3.6). 

« The resources to be used in assessing type and frequency of usage, whether a 
bespoke or existing corpus, or the various types of lexical information made avail- 
able by the Internet (Section 33.4). 
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34.1 INTRODUCTION 


IN linguistics, descriptivism and prescriptivism are commonly depicted as antonyms. 
“The emphasis on objectivity’ and ‘systematicness’ in descriptivism, Crystal's Dictionary 
of Linguistics and Phonetics notes, ‘places it in contrast with PRESCRIPTIVE aims; 
‘the aim of descriptive linguistics is to describe the facts of linguistic usage as they are, 
and not how they ought to be, with reference to some imagined state’ (Dictionary of 
Linguistics and Phonetics s.v. description). Being descriptive is what prescriptivism is 
not—and vice versa. ‘Linguists today understand their job as that of description, their 
purpose being to describe how people use language, not to prescribe how they should use 
it} Kolln and Funk (2002: 4) affirm. 

Dyads of objectivity and subjectivity, of evidence-based analysis versus the pull 
of opinion, and of impartial engagement set against the idiosyncrasies of individual 
response repeatedly recur in such accounts. Yet, rather than forming simple opposi- 
tions, prescription and description can thereby be placed in markedly asymmetric rela- 
tion. Being descriptive is made part of the legitimate practice of linguistic response. 
Prescriptivism is, in contrast, both delegitimized and devalorized: ‘Prescription tries 
to change language by proscribing some forms that are in fact used and prescribing 
alternatives, where description accepts all forms that are used, writes Richard Hudson 
(2010: 59). If descriptivism deals in facts, prescriptivism veers, in this light, towards the 
fictional—the ‘imagined standards (Crystal Dictionary of Linguistics and Phonetics s.v. 
prescriptive) to which correct usage should necessarily conform. 

Such demarcations prove interestingly complex when we turn to lexicography where 
descriptive and prescriptive can co-exist within a single work (and, indeed, at times 
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within a single entry). The precise point at which descriptivism shades into prescriptiv- 
ism can at times be difficult to locate. Descriptive processes of collection and evaluation 
of evidence can be accompanied by prescriptive (and proscriptive) reservation. While 
a general trajectory from prescriptive to descriptive can be identified in the history of 
dictionary-making, even this exhibits unexpected configurations, especially if agendas 
of moral and cultural prescriptivism are brought into consideration. 

That dictionary-making is perhaps, of necessity, ‘regulative’ was stressed, for exam- 
ple, by Derwent Coleridge (uncle of the dictionary’s first editor) in the early stages of 
the Oxford English Dictionary (OED) project. “The office of a Dictionary, a unilingual 
Dictionary more especially, is eminently regulative—regulative in effect, though declar- 
ative in form, Coleridge (1860: 156) reminded his audience at the Philological Society. 
The dictionary- maker is engaged in the construction of a reference model, a didactic text 
which must, by its nature, engage with recommended norms-—of meaning, of spelling, of 
lexical use. An ‘element of normativity, Ladislav Zgusta similarly affirms (here with refer- 
ence to what he terms the ‘standard-descriptive dictionary’) will always co-exist with the 
specification of ‘what is generally regular, normal, what is the norm’ (1971: 211). 

The basis of such judgements—of what is deemed ‘normal in terms of both norms 
and normativity—is of critical import in assessing descriptivism and prescriptivism (and 
their interplay) within lexicographical history and practice. As this chapter will explore, 
linguistic response as manifested in dictionaries is subject to a variety of factors which 
may affect the form and shape of the eventual entry, and the interpretative slant adopted 
in a given work or by a given editor. Reception too offers other aspects of the contigui- 
ties which description and prescription can, in reality, reveal. “How do you spell disyl- 
labic dissyllabic?, James Murray, editor (1878-1915) of the OED asked, drafting this entry 
in 1885. Only one, in the pragmatics of the dictionary text, could appear as headword; the 
other must appear as a secondary and variant form. But, as Murray acknowledged, ‘recent 
events... have shown that people will not be content to let me be purely historical; they 
will take me by force, & make mea king orthographically: Descriptive statements could, as 
he was aware, be translated into prescriptive edicts in the act of reading the dictionary. In 
Coleridge's terms, the dictionary became regulative, even in the act of being intentionally 
descriptive. ‘I have to consider responsibilities, Murray continued; ‘if] give a preference to 
Disyllable, multitudes will follow the standard’ and while ‘I prefer disyllable for my own 
practice... [I] have no desire to pass an Act of Uniformity’ (MP/4/1/1885).! 


34.2 THE NATURE OF AUTHORITY 


As Murray’s letter confirms, assumptions about lexicographical authority can them- 
selves occupy conflicted territory. On one hand is the authority of evidence, on the 


1 Archival documents referenced in this chapter are listed at the end of the chapter. 
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other the presumed authority of the lexicographer. Each, in different ways, can inform 
the decisions made in shaping the published entry. Proper consideration of the former 
underpins the ideal of an authoritative dictionary. Here the information presented is 
well-founded, based on the judicious (and descriptive) analysis of the range of evidence 
available at the time of the dictionary’s composition. For disyllable, for example, while 
<ss> and <s> spellings were attested in the assembled evidence for the OED, the fact that 
distributional patterns” revealed an increasing preference for the latter influenced the 
headword order (disyllable, dissyllable) which was eventually chosen for the published 
entry. The relationship of prescriptive dictionary-maker to evidence is somewhat differ- 
ent. In this context, the intended authority of the dictionary-maker is superimposed on 
the facts of usage, discounting or marginalizing its significance. ‘It is a vulgar and gross 
error’ to use expect ‘in speaking of the past; as, I expect the mail has arrived, Chauncey 
Goodrich hence declared in the 1862 edition of Websters Dictionary. The meaning is 
rejected, irrespective of supporting evidence. The dictionary-maker in such configu- 
rations moves towards the authoritarian, imposing his or her view on the patterns of 
usage, and discarding—at least in terms of the dictionary entry—what does not, for 
whatever reason, meet with approval. 

Even descriptive dictionaries such as the OED can occasionally display partialities 
of this kind. Henry Bradley’s editing of expect reveals a level of disapprobation which 
accords with that expressed by Goodrich (it is a ‘misuse’ which is ‘very common in 
dialectal, vulgar, or carelessly colloquial speech; the entry declares). Quantitative and 
qualitative are divorced; what is ‘common’ may be deemed ‘careless (against the implied 
‘carefulness’ of conservative speakers who may not exhibit the change in question). 
Markedly evaluative diction confirms the presence of prescriptivism, here alongside the 
descriptive facts which the entry also provides. Murray’s insistence that rime (and not 
rhyme) was ‘intrinsically the best’ spelling—and should therefore be used throughout 
the dictionary—provides a similar example. While, as he admitted, rime was not ‘at pre- 
sent favoured by the preponderance of usage’ (1884: x), etymology—and the pull of his- 
tory—-intentionally justified the policy (and underlying model of correctness) adopted. 
Rhyme was ahistorical; its etymologically motivated spelling disguised the fact that the 
word had entered Middle English as rime, ryme via French rime, rather than directly 
from Latin rhythmus. Usage and correctness are placed at odds in a clear conflict of 
descriptive and prescriptive principle. 

Close examination of behind the scenes evidence from the making of the OED fur- 
ther illustrates the complexities of descriptive and prescriptive process in this respect. 
For William Craigie, who would edit this section of the dictionary in the early twenti- 
eth century, the authority of evidence was paramount, and he drafted the text accord- 
ingly. “Rhyme is, and has been for at least two centuries, the standard spelling, he wrote. 
‘Some... derivatives do not occur at all with the spelling rim-, others very rarely’; as 


2 Of the four post-1880 quotations which appear under disyllable (n) in the first edition, three have the 
dis- form. 
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he argued, to put these under a headword rime would ‘expose one to a charge of mis- 
representing facts’ (MP/18/5/1908). Rhyme (rather than rime) duly appeared in head- 
words and definitions alike. Models of a different kind were, however, conspicuous in 
Murray’s response. ‘It was settled long ago... that the Dictionary spelling should be 
rime’, he countered; ‘anarchy’ would be introduced by the preferential selection of rhyme 
(MP/22/5/1908). In the revisions which Craigie was subsequently made to impose, 
rhymeless was defined as ‘without rime; unrimed; rhymeful as ‘abounding in rimes, 
rhymer as ‘one who makes rimes, and so on. These were, moreover, suppor ted by quota- 
tions such as Southey’s ‘Should not rhymeless odes be as harmonious as possible?; or 
Swinburne’s ‘Written in blank verse, or at least in rhymeless lines’ (my emphases). 

‘I am quite prepared to defend the action publicly, whereas I could offer no defence 
of the practice which occurs in the made-up sheet’, Murray stated of Craigie’s original 
text. Whether this is—descriptively—true is, however, a moot point. Craigie had merely 
made use of empirically substantiated evidence; rhyme, in terms of distribution, was 
clearly the dominant variant by 1908, hence his own ‘defence’ of the lexicographical 
practice deployed. The changes instituted instead embody what Hudson (Section 34.1) 
identified as the classic territory of prescriptive action, in the attempted preservation 
of particular ‘standard’ forms against the proscription of others. Letters which Murray 
received during the process of making the dictionary make the descriptive disparities 
at stake still more apparent. “To hear bicycle made to rhyme (you say rime) with icicle, 
distresses me, James Dixon wrote in 1886 (my emphases)—even if in so doing he merely 
brought other notions of correctness into the lexicographical foreground. 

Such insights from behind the scenes remind us that dictionary-making is a pro- 
cess of interpreting as well as gathering information. Precisely how, and why, different 
interpretations are proposed and accepted can be critical in influencing the particular 
trajectories which a dictionary can reveal. Even in the OED revisions under disyllabic, 
for instance, the complexities of descriptive/prescriptive boundaries (and the obliga- 
tions which Coleridge's ‘regulative’ practice suggest) are disturbingly transparent. If the 
descriptive case for disyllable was undoubted, evidence on disyllabic in the dictionary’s 
citation files pointed in precisely the opposite direction. Given with <s> in the published 
text (in the interests of orthographical consistency with related words), it was the <ss> of 
dissyllable which was uniformly attested in the underlying quotations. In this section of 
the text, was the dictionary descriptive or prescriptive or both? 


34.3 CONTROL AND CORRECTNESS 


The fact that rhyme and not rime has—in spite of Murray’s asseverations—remained 
the dominant variant suggests, of course, the limits of a dictionary’s authority. Popular 
convictions about the relationship of dictionary to language nevertheless often assume 
that the dictionary represents a form of linguistic government, its entries offering edicts 
to be obeyed. Such political metaphors can, in fact, be highly illuminating. Does usage 
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offer an image of ‘anarchy’ and unregulated change, as, for instance, Lord Chesterfield 
suggested in 1754, advocating the claims of Johnson’s Dictionary to constrain and con- 
trol diversity? More to the point, to what extent can the lexicographer also be a dictator, 
as in Chesterfield’s projected submission to whatever Johnson might propose (“We must 
have recourse to the old Roman expedient in times of confusion, and chuse a dictator. 
Upon this principle I give my vote for Mr. Johnson to fill that great and arduous post. 
And I hereby declare that I make a total surrender of all my rights and privileges in the 
English language, as a free-born British subject, to the said Mr. Johnson, during the term 
of his dictatorship’ (Boulton 1971: 97))? What are the rights of democracy—the language 
of the majority—in such a view? 

For Chesterfield, as for a number of earlier writers, lexicography’s rightful role was to 
provide a correct (and corrected) ‘standard’ of the language. Existing works, he noted, 
merely confirmed the abdication of proper prescriptive responsibility: “The injudicious 
reader may speak, and write as inelegantly, improperly, and vulgarly as he pleases, by 
and with the authority of one or other of our WoRD-BOOKs’ (Boulton 1971: 96). In the 
model Chesterfield advances, ‘superior’ dictionaries are selective and interventionist, 
exclusive and evaluative. 

Rather than documenting the realities of supra-local practice (as a descriptive engage- 
ment with standardization might suggest), such assumptions about ‘standard’ use centre 
on the imposition of uniformity, correctness, and control, with the dictionary as agent 
of active linguistic reform. Academy discourses in the seventeenth and eighteenth cen- 
turies acted, for instance, as prominent conduits for a shared emphasis on linguistic and 
lexicographical regulation. The Vocabulario of the Italian Accademia della Crusca (liter- 
ally, ‘the academy of the bran’) made its concerns with sifting and winnowing language 
particularly explicit; if the ‘bran’ is retained, the chaff of language—that deemed of less 
or lower value—is intentionally discarded. The Académie francaise made manifest simi- 
lar imperatives with regard to ‘pur usage’ in French. 

Such models established a popular image of lexicographical process, by which the 
language is not to be represented per se, but is instead to be manifested as it ought 
to be, emblematizing ‘good’ or ‘best’ usage alone. As McArthur comments (1995: 382), 
dictionaries tend to display a high degree of ‘psychological fit with the dominant 
linguistic ideologies of the period in which they are composed. Renaissance writers 
in Britain readily responded to the potential for reform of the vernacular language 
which lexicography apparently offered. “Why should not wee have such an excellent 
Dictionarie to shew the nature and propertie of our English speech, as learned and 
laborious Nicot hath made, shewing the utiltie and use of the French tongue’ as George 
Snell (16.49: 36) stressed, for example.’ A dictionary of this kind, he urged, would estab- 
lish English as ‘a settled, certain, and corrected language’ (1649: 35). Like Latin, English 
could attain a classical and invariant grandeur: “The language of our Land, thus brought 
to a fixed and immutable state... will not, as in former ages, so alter out of date and 


> Jean Nicot's Thresor de la langue francoyse was published in 1606, containing 18,000 entries. 
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knowledg . .. wee shall reap all the same profits and advantages that wee see the glori- 
ous Romans have gotten... all nations will esteem and honor the English-tongue next 
to the Latine (1649: 40-1). 

Often neglected in the history of English dictionaries, Snell's work offers a persua- 
sive image of prescriptive (and proscriptive) lexicography. Chesterfield’s hopes for the 
‘lawful standard of our language’ which Johnson's work would provide were similar. Yet, 
even in the eighteenth century, critical readings of prescriptive control can emerge. As 
Johnson pointed out in his ‘Preface (Dictionary of the English Language, 1755: sig.C2r), 
there was in fact ‘no example of a nation that has preserved their words and phrases 
from mutability’; the lexicographer who believes that ‘his dictionary can embalm his 
language, and secure it from corruption and decay’ merited derision rather than praise. 
As Johnson confessed, “Those who have been persuaded to think well of my design, 
require that it should fix our language, and put a stop to those alterations which time 
and chance have hitherto been suffered to make in it without opposition’ Yet, as nine 
years’ work on the Dictionary confirmed, any ‘opposition’ which lexicography might 
provide is constrained by the very nature of a living language, and the essential ‘liberty’ 
(1755: sig.C2v) of its speakers. ‘Neither reason nor experience’ could, in reality, justify 
trying to give ‘longevity to that which its own nature forbids to be immortal! (1755: sig. 
C2v). French, as Johnson pointed out, had continued to change in spite of the ministra- 
tions of an Academy; the Dictionnaire was revised in response to a changing language, 
not the other way round. Control, in this respect, was clearly an illusion. 

At least in part, Johnson here confounds McArthur’s maxims of ‘psychological fit, 
moving away from prescriptive resolution even in the act of drawing his dictionary to a 
close. His Dictionary witnesses both descriptive and prescriptive impulses. The ‘Preface, 
written at the end of the dictionary process, attests conclusions which stress the fallibili- 
ties of prescriptive belief. These differ markedly from the ambitious ‘design’ of purify- 
ing discourse with which the project began. Nevertheless, entries within the dictionary 
proper can indeed manifest the kind of resistance to change and innovation which 
Chesterfield (and Johnson's publishers) expected. Change in progress in precarious is 
proscribed (‘no word is more unskilfully used than this, Johnson declares); semantic 
divisions are neatly separated into legitimate and otherwise (‘It is used for uncertain in 
all its senses; but it only means uncertain, as dependent on others’). Usage and correct- 
ness divide. Roundabout ‘is used as an adjective ... by a colloquial license of language, 
which ought not to have been admitted into books, writes Johnson, providing evidence 
of usage and language attitudes alike. Collocations such as ‘most peculiar’ are deemed 
‘improper. More typical, however, is Johnson’s careful engagement with the quotation 
evidence he assembled for his dictionary which provides the basis of, say, his rigorous 
analysis of phrasal verbs (113 senses under take, 66 under put). If Johnson is often made 
to emblematize the prescriptive lexicographer at work, in reality a spectrum or contin- 
uum of response appears in his Dictionary in which usage (the reality of language prac- 
tice) is often descriptively given its due. 

As the sociolinguist Peter Trudgill (1999: 125) points out, a ‘standard is, in reality, by 
no means reducible to ‘a set ofa prescriptive rules’ which articulate norms of correctness 
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removed from what speakers habitually do. Trudgill’s linguistically informed analysis of 
standardization importantly enables the links between lexicography and standard vari- 
eties to be established with greater clarity. A ‘reliable dictionary, Atkins and Rundell 
(2008) state, is ‘one whose generalizations about word behaviour approximate closely 
to the ways in which people normally use (and understand) language when engaging in 
real communicative acts’ (2008: 45). In this respect, a ‘standard’ (and the lexicographical 
description of such) is located in the consensus norms of a national speech—in all its 
contextual variation. To configure a ‘standard’ in the evaluative diction of good and bad, 
‘vulgar’ and ‘erroneous (especially when this runs counter to the directions of language 
practice) places lexicography in far more problematic territory. As always, the conjunc- 
tion of evidence and interpretation is key. 


34.4 PATROLLING THE BORDERS 


What is (or is not) included in a given dictionary presents other opportunities for pre- 
scriptive/ descriptive conflict. (Compare Diamond, this volume.) No dictionary, however 
descriptive, can include all words and meanings. Even OEDi, intended as an ‘inventory’ 
of the language (Trench 1860: 4) omitted certain words, or categories of word—silencing 
taboo words, for instance, or the language of contraception (see Burchfield 1989: 83-109; 
Mugglestone 2007), as well as omitting countless words because of pressures of space and 
the costs of editing (see Mugglestone 2005). In real terms, the pragmatics of publication 
for a living language constrain full representation; editing a word costs time and money 
and, in a physical text, space too. Likewise, a static text will necessarily lag behind a lan- 
guage on the move. Words on the margin, as in Burchfield’s (1989: 84~5) discussion of 
provisional entries such as miticide and mithril in the OED Supplements reveal the prob- 
lems of choice and selection that lexicography also involves; data, editability, and cur- 
rency all come into play in determining the admission of a word or sense. 

‘Drawing the line’ can, however, often bring the lexicographer into highly conflicted 
territory. The recent edition of the Oxford Junior Dictionary, with its omission of Empire 
and monarch, or words associated with Christianity (aisle, disciple) in favour of new 
entries such as MP3 player and broadband provides a useful example. Here the process 
of revision, intended to bring the dictionary up to date for its target audience, elicited 
accusations both of excessive descriptivism (an unwarranted pandering to neologism) 
as well as of proscription and the deliberate silencing of the particular images of tradi- 
tional British culture. In reality, however, the reorientation of entry-words was driven 
by the concerns of representativeness, descriptively supported by quantitative data from 
the Oxford Children’s English Corpus.* The documentation of new lexical and semantic 


4 See e.g. the response of Anthony Seldon, headmaster of Wellington College: ‘I think as well as being 
descriptive, the Oxford Junior Dictionary has to be prescriptive too, suggesting not just words that are 


DESCRIPTION AND PRESCRIPTION IN DICTIONARIES 553 


items on OED Online can attract similarly charged reactions. “The stalwart bastion of 
language, the Oxford English Dictionary, will now include ¥ and LOL as real words wor- 
thy of etymological recording, wrote the Huffington Post in 2011. ‘Much to the dismay of 
language purists the internet slang term LOL has been officially inserted into the Oxford 
English Dictionary, the website techimpulsion.com proclaimed: “Their dismay gets 
converted into horror when they find that giving company to LOL is the internet slang 
OMG? 

Inclusion, as here, is often seen as a prescriptive act of legitimization, as proof that a 
form has ‘really’ entered the language. As in the Huffington Post’s assumptions about 
LOL’s new-found identity as ‘real word, the dictionary is seen as conferring sanction 
and acceptability. Omission can conversely be interpreted as proscriptive silencing, The 
dictionary-maker is constructed as gate-keeper, momentarily opening up the ‘bastion 
to new members, irrespective of the fact that, in modern evidence-based lexicography, 
the process runs in precisely the opposite trajectory. Usage—the democracy of words— 
governs the decision to include or exclude a given word or sense (although the larger 
a dictionary is, the more it will, of course, be able to include). A range of citations, for 
example, provide indisputable testimony for the frequency of LOL and OMG in current 
usage. Worth remembering too is the fact that the citations included within an entry by 
no means represent the totality of evidence of the dictionary’s disposal. In reality, a far 
wider engagement descriptively underpins the conclusions drawn. 

Conflicted readings of this kind repeatedly surface with reference to neologisms or 
the spread of loanwords. ‘Men might have no more libertie to write or innovate new 
language, than is permitted them to stamp and coin monie} wrote Snell in 1649 (41-2), 
advocating another aspect of prescriptive and restrictive lexicography. Purism, as such 
comments indicate, can bea particularly potent prescriptive impulse. While the borders 
of any language are indeterminate (‘there is absolutely no defining line in any direction, 
Murray wrote in the ‘General Explanations of the OED (1888: xvii)), notions that a line 
should be drawn—whether on the grounds of history, or need, or simple and visceral 
dislike—are commonplace, in lexicographical history and language attitudes alike. 
Ideas of lexical necessity, for instance, often appear in Johnson's Dictionary in line with 
his pointed comments on the ‘folly of naturalizing useless foreigners to the injury of the 
natives’ (1755: sig.Biv)). Recent French loans such as ruse and finesse are deemed unnec- 
essary; included in the dictionary, they are nevertheless placed outside the requirements 
of the lexicon. Need, however, is a complex mechanism; here Johnson's prescriptive 
assessment of ‘need’ (and utility) in terms of existing lexical and semantic fields is, for 
instance, countered by the realities of ‘need’ as adjudged by users themselves. As the 
subsequent history of ruse and finesse confirms, loanwords acquire citizenship not 
through lexicographic sanction but through processes of assimilation where language 
users rather than lexicographers have the metaphorical last word. Dryden’s introduction 


used but words that should be used. It has a duty to keep these words within usage, not merely pander 
to an audience. We are looking at the loss of words of great beauty. I would rather have “marzipan” and 
“mistletoe” then “MP3 player” (Henry 2008). 
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of French-derived fraischeur fails to reach mainstream usage not because of Johnson's 
proscription (‘A word foolishly innovated by Dryden’) but because, descriptively, writers 
and speakers continued to prefer ‘freshness’ and ‘coolness. 

Change in progress in terms of new sense divisions can operate in a similar way. 
Words such as transpire and fortuitous provide interesting snapshots of different pat- 
terns of lexicographic response. The semantic split of the former, by which a new sense 
signifying ‘to happen’ gradually come into widespread use, clearly met interpretative 
resistance in OED1, in spite of accompanying evidence from the late eighteenth cen- 
tury onwards, As the entry confirms (“Misused for: To occur, happen, take place’), it was 
tempting to see this as simple mistake—as testimony to writers misinterpreting what 
the word ‘really’ meant. Metalanguage is, as always, a reliable indicator of prescriptive 
or descriptive response. ‘Misuse’ consigns this sense of transpire, at least intentionally, 
to the realm of linguistic error, a contravention of accepted norms. Yet, if we look at the 
range of ‘mis-usages’ presented, they are strikingly widespread, taking us either into the 
problematic concept of ‘mass error’—or, if we read the same evidence descriptively, into 
the territory of a newly derived sense-division which was, even in print, already diffus- 
ing rapidly. 

Similar patterns inform the recording of semantic shift in fortuitous. The American 
Heritage Dictionary (AHD) hence treads a careful line between what is described as its 
‘best-established sense’ (“happening by accident or chance.” Thus, a fortuitous meet- 
ing may have either fortunate or unfortunate consequences’), the transferred meaning 
which is ‘often been used in reference to happy accidents’ (The companys profits were 
enhanced as the result of a fortuitous drop in the cost of paper), and what is given as the 
‘more controversial’ meaning ‘lucky or fortunate’ (as in ‘He came to the Giants in June 
as the result of a fortuitous trade that sent two players back to the Reds’). While historical 
testimony for the latter is given (“This use dates back at least to the 1920s’), presumably 
in an attempt to defuse controversy, so too is the evidence of language attitudes (‘it is 
still widely regarded as incorrect’). As the entry confirms, reconciling descriptive and 
prescriptive can be challenging, especially in a change in progress where norms of usage 
and norms of correctness continue to be in conflict. (In AHDs an updated commentary 
on changing attitudes over time is substituted here.) As the proofs of OED: indicate, 
however, the controversial’ sense of fortuitous has existed for far longer than the edi- 
tors of AHD suspected—as have attendant language attitudes. A hand-written note by 
Fitzedward Hall, one of the OED’s critical readers, on the relevant proof sheet (dated 31 
Oct 1896) states: ‘Fortuitous = fortunate 1799. A gross error, not worth recording, & so 
not sent’ Hall simultaneously indicated the existence of a new sense, and prescriptively 
edited it out of the evidence supplied for the dictionary. Modern lexicography strives to 
achieve a different balance by its emphasis on the facts of usage, as well as documenting 
(while not endorsing) the sensibilities which such usage may provoke. ‘Fortuitous tends 
to be often used to refer only to fortunate outcomes and the word has becomes more or 
less a synonym for “lucky” or “fortunate” , states the relevant entry in the 2010 Oxford 
Dictionary of English (ODE); ‘Although this usage is now widespread, it is still regarded 
by some people as incorrect’ 
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34.5 THE SOURCES OF EVIDENCE 


‘The Dictionarist, like an Historian, comes after the Affair; and gives a Description 
of what has passed} wrote Ephraim Chambers (2728: xxii). Long before the OED or 
Richard Chenevix Trench’s formative lectures of 1857, Chambers articulated the salient 
precepts of descriptive lexicography. The dictionary-writer must pay attention to the 
facts of language that already exist—rather than, as for Johnson's entry for precarious 
or Goodrich’s for expect, trying to direct current and future usage alike. Nevertheless, 
even Chambers’s axiom—valid as it is—raises wider questions. Which facts, and how 
many, are to be taken into account? Are all facts equally valid, or shoulda line be drawn 
here as well? The nature (as well as the use) of evidence collected for a dictionary are fur- 
ther important facets of the alignment of dictionary- making with descriptive and pre- 
scriptive principles. Quality as well as quantity emerges as a recurrent topos in debates 
about lexicographical authority, especially in terms of the limits of descriptivism and 
the legitimization which might (or not) be provided by the non-canonical. “Who are we 
that we should use... graphiology, employed in one sense by Lady Lytton and another 
by the Daily Telegraph?’ argued E.J.E. (1900: 547) with reference to evidence in the OED: 
‘Individualic may be an excellent word, but one may never have heard of]. Gilchrist who 
used in in 1824” If ‘the Medical News and the British Medical Journal call a person unfit 
for an operation inoperable’, are they ‘justified in doing so?’ 

The fact that dictionary citations were illustrative, attesting usage and currency, 
rather than exemplary—acting as models with potentially prescriptive force—was, for 
many, difficult to accept. While their canonicity is undoubted, literary writers such as 
Shakespeare or Browning are, for example, scarcely representative of ordinary usage, 
raising questions about their utility to the dictionary-user as illustrative examples, as 
well as to the dictionary-maker in the attempt to derive meaning. Presuppositions that 
good writers attested good usage, and that this was what a dictionary should provide 
nevertheless prove of surprising longevity. Historical precedent was clearly influential in 
this respect. While modern descriptive lexicography centres on knowing how language 
is typically used (‘the behavior of each word . .. in its natural contexts, the New Oxford 
American Dictionary (2010) stresses), hierarchical models of data selection commonly 
underpinned earlier lexicography, as in Abel Boyer’s conviction that ‘best authors’ alone 
legitimize the language practices on which entries are based. “Words... found in any 
Writer of unsufficient Authority’ are to be marked ‘Dubious’ while “The English [is] 
collected chiefly out of the great masters of the English tongue, such as Archbishop 
Tillotson, Bishop Sprat, Sir Roger l’Estrange, Mr. Dryden, Sir William Temple’ (Royal 
Dictionary, 1699: title-page). ‘It will be proper to observe some obvious rules, such as 
preferring writers of the first reputation to those ofan inferior rank, Johnson (1747: 30-1) 
likewise affirmed. 

The fundamental principle of citations—the gathering of a corpus of evidence—as the 
basis of descriptive dictionary-making was enshrined by writers such as Franz Passow 
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who saw the citational history of a given word as a prime means by which it could, in 
effect, tell its own story (compare Considine, this volume). As Herbert Coleridge 
stressed in his early editorship of the OED, ‘the theory of lexicography which we pro- 
fess is that which Passow was to the first to enunciate clearly and put into practice 
successfully—viz., “that every word should be made to tell its own story’ (Trench 
1860: 72). Evidence thereby also becomes the means by which the potential partiality 
of the dictionary-maker can be excluded. Which story is told will nevertheless depend 
upon the nature of the evidence collected. Johnson's strictures, for example, were clearly 
relaxed in the process of data collection. ‘Words must be sought where they are used’ 
(1755: sig.Bav), he admitted: ‘Some of the examples have been taken from writers who 
were never mentioned as masters of elegance or models of stile.” A skewed empiricism, 
however, often remains in evidence. Literary texts (and male writers) undoubtedly 
dominate. Crystal’s ‘systematicness’ (Dictionary of Linguistics and Phonetics s.v. descrip- 
tion)—identified as an important prerequisite of descriptive analysis—is likewise awry. 
Reading Burton’s Anatomy of Melancholy, Johnson marks words such as culminating, 
partile, gravity, and muck-hill for inclusion. Yet other words in the same text (horse- 
mill, Painter’s-shop) are neglected. Johnson's statements that jeopardy is ‘a word not now 
in use’ or, under smouldring, that “This word seems to be a participle; but I know not 
whether the verb smoulder be in use’ presumably derive from similarly imperfect acts of 
reading. Describing language accurately depends upon the nature of the facts to which a 
dictionary-maker has access, as well as upon the interpretative strategies deployed. 
Qualitative considerations—with their underlying agenda of prescriptive and pro- 
scriptive response—could also be difficult to eliminate. The validity of newspaper 
citations were a case in point. As information from the making of the OED confirms, 
such sources were deemed of incontestable value by Murray. Enabling the diction- 
ary-maker to get much closer to ‘real’ language, ‘they show, he argued, ‘how the lan- 
guage grows... make visible to us the actual steps which for earlier stages we must 
reconstruct by inference’ (MP/9/6/1882). The Delegates of Oxford University Press 
maintained a rather different notion of lexicographical appropriacy: ‘Should not the 
quotations illustrative of modern literary words be taken from great authors, and the 
language of newspapers banished? they demanded (MP/Meeting/1883). Evidence 
deriving from popular fiction could also be controversial. Henry Hucks Gibbs 
(another advisor on OED1) condemned the popular Victorian novelist Elizabeth 
Braddon as ‘hasty’ (‘she grinds out novels by the yard, and does not give herself time to 
think whether she is writing good English or not’ (MP/3/5/1883)). Similar arguments 
informed popular response to evidence used by Philip Gove in writing Webster's Third 
New International Dictionary. ‘Systematic reading of books, magazines, newspapers, 
pamphlets, catalogs, and learned journals’ had provided a wealth of new information 


5 See e.g. John Arbuthnot’s Essay Concerning the Nature of Aliments (1731) under alimentary (Of 
alimentary roots, some are pulpy and very nutritious; as, turneps and carrots’), and John Mortimer’s The 
Whole Art of Husbandry (1707) under fill n. (‘This mule being put in the fill of a cart, run away with the 
cart and timber’). 
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with which to describe contemporary American usage, Gove stressed, emphasizing 
the descriptive value of ‘independent investigation of usage borne out by genuine cita- 
tions’, whether these derived from actors, journalists, politicians, or more established 
writers. Gove’s insistence on such quantitative salience was met with qualitative res- 
ervation in many reviews (see Morton 1994). Yet, as in Gibbs's censure of Broughton, 
qualities such as ‘hastiness’ could in fact bring evidence closer to the realities of ordi- 
nary language practice— circumventing the artificiality which a closely monitored 
style can bring. If Gibbs subjectively contested Broughton’s status as ‘authority’ in the 
processes of lexicography, the argument could clearly be reversed. 

Dictionary-making in this respect presents a changing pattern of empirical engage- 
ment and evaluative bias. While Murray’s descriptive ideals could, in practice, be chal- 
lenged,® the modern OED, as in its inclusion of doh and OMG, is firmly orientated 
towards the realities of ordinary—rather than exemplary—usage. Citational evidence 
from journalism and popular discourse can characterize new entries, as well as revi- 
sion of older ones. Recent editions of Merriam-Webster attest twenty-first-century 
blends without resistance—as in staycation, frenemy, and viog (a video-blog). Modern 
corpora often eliminate the poetical altogether, while the literary is restricted in com- 
parison to types of data which provide a more rigorous—and balanced—engagement 
with language as deployed by the majority rather than minority of its users. Collins 
Dictionaries base their evidence on the Bank of English (over 525 million words of writ- 
ten and spoken English by 2005). As John Sinclair (1987a) stressed in establishing the 
Cobuild project, the corpus was ‘a device though which the user will observe the living 
language. Not the frozen fillets of the printed citations, nor the stuffed dummies of the 
made-up examples, but the language as it is when it is being used’. Frequency scores 
provide empirical (and indisputable) evidence for currency and change. 


34.6 ACTS OF INTERPRETATION 


In Passow’s image of lexical biography, the word, as we have seen, tells its own story. The 
dictionary-maker merely facilitates the transmission of the facts. “The dream of the his- 
torical lexicographer is that the quotations chosen will be so apposite, will so accurately 
reveal the meaning and uses of the words they illustrate, that nothing more will need to 
be added} as William Ramson (2002: 15), editor of the Australian National Dictionary, 
later wrote. Yet such distancing can, in practice, be problematic. If the data provide the 


§ Data collection by volunteer readers could lead to over-representation of literary texts, as could 
the use of concordances to Shakespeare—which record every word in the Shakespeare canon. Likewise 
pressures of space could mean that newspaper data, often included at the end of an entry, proved 
particularly vulnerable to the processes of editing out, even if it was considered in the underlying analysis 
ofan entry (see Mugglestone 2005). 
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documentary evidence, it is, of necessity, the dictionary-maker who must shape the 
entry, telling the story, in effect, from a particular point of view. 

Cultural prescriptivism—the presenting of norms and images of normativity which 
are ideologically rather than empirically affirmed—is a further aspect of lexicographical 
practice which demands scrutiny in this respect. It too can trade on a loaded metalan- 
guage in which notions of acceptability (or otherwise) can come to the fore. ‘A diction- 
ary, Trench stated in his lectures, is ‘the history of nation documented from one point 
of view’ (1860: 6). If, on one level, Trench’s words indicate the intersection of language 
and material culture (the markers of a particular way of life, characteristic of one nation 
rather than another, together with their associated cultural matrices), it is clear that 
‘point of view’ can intervene in ways which also move outside this descriptive remit. 
Individual point of view can be detected in, say, Johnson’s entry for foxhunter (‘a man 
whose chief ambition is to shew his bravery in hunting foxes’). More pervasive cultural 
and ideological positions can be at work elsewhere. Discourses of religious identity can 
be particularly problematic. Pronominal positioning (and negatively charged diction) in 
Bailey's Universal Etymological English Dictionary firmly aligns ideal readers with main- 
stream Christianity (and against Catholicism); see e.g. crucifix: ‘a Figure representing 
our Saviour on the Cross’; limbus: ‘according to the Notion of the Roman Catholicks...a 
Place where the deceased Patriarchs resided till the Coming of our Saviour. Exorcism 
in Dyche and Pardon’s New General English Dictionary (1735) is a ‘practice, imposing 
much upon the credulity of their blind adorers. Similar notions of norm and ‘other’ 
can appear in OEDi, as under infidelity, which is intentionally explained by the gloss 
‘Muhammadanism; heathenism’ or ghazi, defined as ‘a champion against infidels. The 
elaboration of the latter (‘In modern use, chiefly applied to Muslim fanatics who have 
devoted themselves to the destruction of infidels’) is surely ripe for revision.’ 

Sex, race, and taboo of various kinds all present similar opportunities for the imag- 
ing of cultural norms, whether in the disapprobation evident in Victorian definitions of 
masturbation (‘the practice of self-abuse’) or the illustrative citations given for tobacco 
in more recent works (‘Try to do without tobacco and alcohol, in Cobuild). The morally 
loaded diction of earlier texts is, however, now rare. ‘Lust’ (‘chiefly and now exclusively 
used implying intense moral reprobation, OED lust sense 4) hence moves outside defin- 
ing practice in revising pandar (OEDi: ‘one who supplies another with the means of 
gratifying lust’: ‘a person who provides another with the means of sexual gratification in 
OED3). Prostitution is likewise revised upon factual rather than moral lines (OED1: “The 
offering of the body to indiscriminate lewdness’; OED3: ‘the practice or occupation of 
engaging in sexual activity with someone for payment’). Configurations of ‘unnatural- 
ness’ in the defining strategies for words such as tribade (OED1: ‘A woman who practices 
unnatural vice with other womer’) present a further case in point. Cultural proscrip- 
tion articulated the boundaries of ‘natural’ and ‘normal, legitimizing heterosexuality 
above homosexuality, and sexual continence above sexual activity. Tribade: ‘A woman 


7 ‘This entry, taken from OED Online in July 2012, already bears the signs of partial cultural revision, in 
the displacement of ‘Mohammedan’ (used in OED1) by ‘Muslim. 
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who engages in sexual activity with other women; a Lesbian, OED3 now states. As here, 
revision for new editions can also mean revising cultural norms and attitudes which 
the nuances of definition have hitherto expressed, and displacing prescriptivism with 
intentionally neutral description in this respect as well. Yet, as Kaye (1989) argues, even 
corpora do not necessarily resolve such problems of cultural bias. The dictionary-maker 
must still select examples, as well as decide on definition and the nature and type of 
sense-division. 

The divide of description and prescription can therefore remain complex. Definitions 
reflect the dominant—and naturalized—ideologies of a given period, elucidating norms 
which seem ‘normal in their own time, whether this is in terms of sexual tolerance or 
repressiveness. The diction of openness, equality, and sexual tolerance, for instance, 
articulates relevant socio-cultural norms in modern lexicography, in line with changing 
social attitudes. Yet those who do not share such positions may also perceive the pre- 
scription (and proscription) of particular viewpoints in ways which may perhaps seem 
equally problematic. Descriptivism and prescriptivism can, in practice, be riven with 
areas of potential ambiguity and overlap. In the OED’s attempts to capture the negative 
semantic prosodies of serviette, for example (‘It may now be regarded as naturalized, but 
latterly has come to be considered vulgar’), does ‘vulgar’ proscribe, or is it a way of trying 
(if in a somewhat heavy-handed way) to engage with the socio-cultural matrices of use? 
Such markers also have potentially dissuasive force, as well as dividing dictionary and 
dictionary-writer from a set of users who might indeed use serviette—but who would 
not consign themselves to ‘vulgarity’ as a result. 

Absolute neutrality—the objective essence of descriptive lexicography—can there- 
fore be extremely difficult to achieve. To look at the entry for literally, say, in the Oxford 
Dictionary of English (2010) is to witness the complex juggling required in the attempt 
to achieve balance, here in representing an often stigmatized pattern of usage. The 
non-literal sense, as in is ‘T literally died’ is, we are told, ‘very common ’ but ‘can lead to 
unintentionally humorous effects’ which can mean that it is ‘not acceptable in formal 
contexts’ even though ‘it is widespread. As this level of detail and specification con- 
firms, the modern sense of a ‘good’ dictionary is clearly distinct from the prescriptive 
ideals espoused by Snell and Chesterfield. ‘A good dictionary reports the language as 
it is, not as the editors (or anyone else) would wish it to be, the New Oxford American 
Dictionary (2010) avers. If popular language attitudes can still exhibit robust prescrip- 
tive allegiances, the dominant trends in dictionary-making have moved to the oppo- 
site side of the spectrum. ‘“Discomfit” does not mean to make uncomfortable’ but 
‘means to rout or overwhelm; a recent article in The Times categorically announced, 
contesting its use by David Cameron. “The words discomfit and discomfort are ety- 
mologically unrelated but in modern use their principal meanings have collapsed into 
one: “make someone feel uneasy” ; the Oxford Dictionary of English counters, firmly 
severing descriptive realities and prescriptive fictions. Objectivity displaces subjective 
response. Telling the story, aided by digital lexicography (and the scale of data analysis 
which this facilitates), has, in this respect, certainly moved much closer to Passow’s 
ideals. 
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35.1 INTRODUCTION 


MANnaAGinG a dictionary project is no simple matter, as James Murray and his colleagues 
found out in the course of the prolonged gestation of the Oxford English Dictionary (see 
for example Murray 1977). But things have changed considerably since Murray was pay- 
ing his children on a sliding scale according to their age to sort quotation slips. These 
changes have, of course, happened over a long period, but in the last 20 years or so, 
methods of dictionary compiling and project management have moved into a new era, 
thanks to developments in computer technology and computational linguistics. In the 
limited space available here, we will outline a methodology that harnesses the resulting 
resources. To this end, we will focus primarily on commercially produced synchronic 
dictionaries intended for a wide readership (including monolingual and bilingual dic- 
tionaries, both for general users and for learners). 
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35.2 DEVELOPMENTS IN DICTIONARY 
PROJECT MANAGEMENT 


35.2.1 The Project Outline 


Time was, when the chief editor of a dictionary project presided, largely single-handedly, 
over the entire management of the project in hand. This meant juggling between taking 
all the decisions concerning editorial policy, monitoring the quality of the lexicogra- 
phy, managing the lexicographers on a day-to-day basis, as well as drawing up budgets 
and schedules, monitoring these, and progress-chasing. In addition, the chief editor was 
expected to produce regular reports on every aspect of the project. Traditionally, the 
chief editor was a highly skilled lexicographer, with a wide experience of developing lex- 
icographical policy and of editorial decision-making. Person management skills were 
of course, and still are, essential to the job of being a chief editor. But a frequently heard 
lament was that the time spent monitoring budgets and schedules left not nearly enough 
to attend to lexicographical questions. Sidney Landau back in 1984 put the dilemma ina 
nutshell: 


The editor-in-chief has the ultimate responsibility for producing the dictionary on 
schedule and within budget .... Whether the editor-in-chief is invited to participate 
in the business decisions affecting the profitability of the dictionary—its pricing, 
production costs, and so on—depends more on his particular situation within the 
company than his position as editor-in-chief. In any event, much of his time is spent 
in monitoring costs .... He is also responsible for the quality of the work and does as 
much editing as time permits. (Landau 1984: 237)! (our italics) 


The processes of so-called ‘content-neutral’ project management (people, sched- 
ules, budgets, workflow, etc.) and those of the lexicography management will always 
intermesh to an extent, but developments over the last two decades or so in compu- 
tational dictionary management tools along with increasingly sophisticated diction- 
ary editing software have made it possible to separate out to a large extent the strands 
of these two aspects of a dictionary project. This in turn has opened the way to much 
more time-efficient and skill-efficient methods of project management. In addition to 
considering the benefits of computational advances for the various editorial processes, 
which we will look at in some detail, we will also outline how recently available technol- 
ogy allows tasks such as textflow management, scheduling, and batching, to be accom- 
plished using purpose-designed computational tools. Equally, statistics can be readily 
extracted to produce status reports and project financial accounts at given intervals. 
Separating out the content-neutral aspects of the project management to be handled by 


1 Passage retained with very slight textual changes in Landau (2001: 355). 
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personnel with experience in business administration and management leaves the chief 
editor free to concentrate on the quality of the lexicography. 


35.2.2, Technology 


In the world of commercial dictionaries, it is now rare—or even unknown—for a dic- 
tionary project to be embarked upon with an open-ended timescale. Schedules are 
always tight and budgets follow suit, so production planning is likely to encompass the 
use of sophisticated computational approaches to the creation and analysis of the lin- 
guistic resources as well as to the actual creation of the dictionary. Customizable software 
which provides an online editing environment, coupled to sophisticated management 
tools have changed the face of dictionary writing. Such a system allows lexicographers 
to encode the elements of the macrostructure of the dictionary and microstructure of 
the entries from the outset, as well as providing a variety of tools to facilitate their task 
(see Section 35.3.3 on the Dictionary Writing System or DWS). 

We will talk in more detail later in the chapter about the editing environment, the 
issues surrounding the lexicographers’ working methods, and the linguistic and analyti- 
cal tools they will have at their disposal. 

It is the job of the management team to ensure that the financial resources allocated to 
the project are put to optimal use. In the twenty-first century this means considering any 
technical contribution that can speed up and refine the work of the lexicographers. When 
we talk about the application of technology to dictionary production we are not talking 
about a dictionary written by a computer. However, computer-assisted lexicography and 
project management can undoubtedly deliver benefits in terms of both cost and quality. 

The benefits to users and dictionary creators alike of a growing familiarity with tech- 
nology are still unfolding; but what is clear is that the ways in which users access dic- 
tionaries have become more varied. Users nowadays are more demanding and above 
all ‘tech savvy’ In recent years there has been a strong trend for the digitally literate, in 
particular young learners and students, to access all the information they seek ‘online, 
this is true of dictionaries too, whether a quick look-up for a definition or to check a 
spelling or more systematic enquiries from language learners. Publishers responding to 
new markets will usually, even where a print dictionary is the primary version, want to 
have an electronic version alongside. 


35.3 PRE-COMPILATION PROCESSES 


35.3.1 Planning 


The dictionary type and, at least to some extent, the user profile, that is to say the pro- 
jected user market, what types of information users will need from their dictionary and 
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database that provides a headword list and a detailed grammatical and semantic analysis 
of the behaviour of each headword, along with examples. Nowadays, a framework will 
be the fruit of sophisticated corpus analysis and will use corpus examples. An example 
of such is DANTE, the lexical database of English commissioned by Foras na Gaeilge 
(see <www.webdante.com>). 


35.3.5.1.3 Start from Scratch 
Establish a headword list using frequency criteria derived from a large text corpus. 


35.3.5.1.4 The Case of the Bilingual Dictionary 

For a bidirectional bilingual dictionary the headword lists for both languages will need 
to be established. This is usually done according to the methods above. Another option 
is to create the headword lists by reversing existing bilingual dictionaries. This is done 
in order to establish a headword list of the initial target language and requires compu- 
tational input in order to derive a lemmatized list. It should, however, be borne in mind 
that the results of such a method will require substantial post-editing, keeping a careful 
eye on a headword list derived from a large text corpus. 


35.3.5.2. Inclusion 


Again, this topic is covered elsewhere in this Handbook (see Diamond, this volume), 
From a project management point of view, and particularly in terms of the day-to-day 
work of the lexicographers, a firm and duly considered policy needs to be set up well in 
advance of the start of work on what type of item will qualify for inclusion in the head- 
word list (enyclopaedic items, technical terms, etc.) 


35.3.5.3 Classifying the Headword List 


Classifying the headword list, ideally according to both lexical type and relative com- 
plexity, provides the foundations of editorial and workflow management (see Atkins 
and Grundy 2006). This will mean building a process of classification into the headword 
list as it has been established. A basic set of lexical types, which we can think of as a hori- 
zontal cut, might for example be: 


« standard lexical items; 

* proper nouns/encyclopaedic items; 

« function words: modal verbs, prepositions, etc.; 
e abbreviations. 


These can be further fine-tuned according to the type of dictionary. For instance, a 
large general purpose dictionary may require a classification of technical and scientific 
terms so that these can be batched according to domain for compilation by an expert in 
the field. 

On the vertical axis, some means has to be found of classifying entries according to 
perceived complexity in order to estimate how long each will take to compile. This is not 
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an exact science but it serves a vital purpose both in limiting the potential for overruns 
and providing the criteria by which progress will be measured. The results of such a clas- 
sification can subsequently be represented in the XML database in the form of attrib- 
utes attached to each headword (see Section 35.3.7.2 below). Having an established ‘level’ 
marker allows for batches to be created according to the experience, proficiency, and 
availability of the lexicographer, Simple entries can be pulled out to create batches for 
lexicographer training. Lexicographers with particular experience and aptitudes can be 
allocated a specific entry type, for example function words, proper nouns, encyclopae- 
dic items. The most complex entries can be allocated to the most experienced members 
of the lexicography team, although an unrelenting diet of these is not advisable. It is 
perhaps worth pointing out that timings according to entry complexity are inevitably 
exponential: an entry that has ten lexical units will take a great deal longer to compile 
than will ten entries each having only one lexical unit. 


35.3.6 Microstructure: The Shape of the Entries 


This is an issue for the chief editor and her or his advisors. Much of the layout of the 
entries will, or at least should, be guided by what have been defined as the needs of 
the projected user of the dictionary. The decision-making process will be carried out 
via the drafting of a good number of representative sample entries: the different parts 
of speech, function words, complex core words, encyclopaedic items, specialist terms, 
and so on. Clear editorial policies worked out at this pre-planning stage will feed into 
the project DTD and the Style Guide (see Sections 35.3.7.1 and 35.3.7.5). 


35.3.7. Customization of the DWS 


35.3.71 The DTD 


The DTD lays down the underlying structure and associated rules for the building of the 
dictionary text. It will be the product ofa collaborative process on the part of the senior 
lexicographical management team anda lexical computing expert. This is a circular pro- 
cess. Sample entries representative of the dictionary editorial policy will be created and, 
deriving from these, a complete list of entry components will be established, identifying 
the constituent elements of the macrostructure and the microstructure. 

The DTD establishes the hierarchical relationships between the various elements of 
the entry. It is thus the DTD that provides the underlying XML structure of the dic- 
tionary database and lays down the rules for what constitutes a legal XML structure, 
disallowing, via the validation facility in the DWS, any illegal use of the XML structure. 
This means what is allowed where within the hierarchy and can be further fine-tuned 
to define elements that are obligatory within a certain containing element. It will also 
be useful to define, in certain cases at least, the order in which elements should appear. 
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There are no hard and fast rules governing the shape of a DTD but it is worth emphasiz- 
ing that the more finely tuned the DTD is, the better senior editors will be able to use 
XML search strings in the DWS in order to monitor consistency and to check for errors. 
The complete list for the DANTE project (Database of Analysed Texts of English), 
referred to in Section 35.3.5.1.2, contains ninety-four defined elements and can be viewed 
on <www.webdante.com>. 

Most elements are ‘structural; that is to say they contain other elements as specified 
in the DTD and never freely typed data. Conversely, the elements in capitals can only 
contain text and not other elements. 

Elements are identified as fields in the dictionary database and in the lexicographer 
interface are marked by tags. A tag identifies the element type, for example, definition, 
example, part of speech, and will be given a name that refers to this, e.g. <POS> for part 
of speech. 

Most dictionary projects would require considerably fewer elements. The rather 
giddy total of ninety-four reflects the fact that DANTE is an analytical database, for use 
by lexicographers, but also intended to be largely machine-readable. As already dis- 
cussed, the DTD must be loaded into the DWS before work can begin. Adjustments are 
often necessary as work gets underway but these should be avoided as far as possible, as 
changes can cause far-reaching problems with entry validation. Following the finaliza- 
tion of the DTD, configuration files are built in the DWS. These determine the appear- 
ance of the dictionary text on the editing screens in terms of colour and typography and 
arrangement on the editor screen. 


35.3.7.2 Attributes 


An attribute is ‘a piece of information which determines the properties of a field or tag 
in a database or a string of characters in a display’ (Oxford Dictionary of English). In the 
context of the lexicographer’s editing environment, this means attaching an attribute 
to a tag in the XML structure of a dictionary entry. The use of attributes in the editing 
environment offers a variety of means to streamline the work of the editors and also con- 
siderably reduces the risk of inconsistencies. 

In the DWS Entry Editor, the use of customized attributes in XML makes it possible 
to create large numbers of closed lists, for example linguistic labels and parts of speech, 
to be used in the dictionary. These are accessed via pull-down menus in the Attributes 
Manager on the lexicographer’s desktop. For instance, clicking on the element which, 
in the XML structure, is to contain a part of speech will bring up the menu of part-of- 
speech attributes to be used in the dictionary. The lexicographer can then simply select 
the attribute required in order to put it into the tag. How the part of speech appears in 
the text is determined by the configuration rules. Other obvious candidates for such 
lists are linguistic labels of all types. The lists will be established by the senior editors and 
this customization then incorporated as attributes into the DWS. As we have seen else- 
where, attributes can also be used to provide searchable information on entry types so 
that entries can be batched for compiling accordingly. 
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35.3.73 Proforma Entries 


Populating entries with standardized ‘proformas’ (see Atkins and Rundell 2008: 123-29) 
can speed up considerably the compilation ofa dictionary. These are ‘model’ entries for 
a certain number of entry types, normally lexical sets of which the members behave in 
broadly similar ways, such as birds, fish, countries, points of the compass, etc. A list of 
the headwords belonging to each set is then drawn up and the appropriate proforma 
attached automatically to each member of the set, for use by the lexicographer. For these 
entries, the task of the lexicographer may be confined to checking and validating the 
entry, that is to say ensuring compliance with the DTD, and submitting it as complete. 


35.3.7.4 Other Lexical Content 


Current exciting developments in automatic sense disambiguation could lead to entries 
being populated with a basic set of lexical units (LUs) derived from the corpus. This 
holds out the prospect for a fundamental revolution in the methodology of entry writ- 
ing (see Rundell 2012). 


35.3.75 Style Guide 


Closely associated with the DTD is the writing of the style guide, which will provide 
detailed information on every aspect of the editorial policy of the dictionary. It must 
provide instructions on how to deal with each constituent element of the dictionary 
entry. It will explain the editorial policy concerning every aspect of the macrostructure 
and the microstructure and the implications of these. It will also fully explain how to use 
the DTD elements to structure the entry in a way that renders it ‘valid’ according to the 
DTD rules. Ideally, the style guide will be integrated, carefully indexed, into the editing 
environment for instant lexicographer access. 


35.3.7.6 System Templates 


An entry ‘skeleton’ for each entry type according to part of speech or lexical type, for 
example ‘encyclopaedic entries, is created and loaded into the database. This means that 
when a lexicographer downloads a headword from the database for compiling, it comes 
with the basic XML elements already in place. In addition the chief editor can pre-define 
the basic ‘chunks’ of empty entry structure that lexicographers will use regularly when 
compiling an entry and these can be rapidly inserted using keyboard shortcuts. Written 
for use by the lexicographers working on the project, they should be integrated into the 
DWS for instant access by the lexicographers. 


35.4 THE EDITING ENVIRONMENT 


The Entry Editor of the DWS constitutes the environment in which the essential work 
of the lexicography is carried out. The lexicographers will be able to link with the 
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dictionary database, where batches assigned to them under the workflow plan can be 
directly downloaded to their desktop, for them to work on. 

Ideally the entry editor configuration will include the possibility of viewing entries 
in XML-structure and WYSIWYG (what-you-see-is-what-you-get) formats, together 
with a summary view of an entry showing only minimal information for each LU. This is 
very helpful when navigating in complex entries. 

No two dictionary projects are identical. It will be part of the chief editor’s task 
in the planning phase to become entirely familiar with the potential of the DWS in 
order to make it operate in the best possible manner, to facilitate the task of the lexi- 
cographers, and to ensure maximum accuracy and consistency in the compiling and 
editing. For instance, the DWS should provide extent statistics and cross-reference 
checking. 

Figure 35.1 shows an entry editor window with the entries in the lexicographer’s cur- 
rent batch displayed in the first window on the left; as the entry is compiled, a summary 
of the content, here arranged by the meaning of each lexical unit, appears immediately 
above the list. The downloaded entry is displayed in WYSIWYG format in the second 
pane and in the structure format in the third. The fourth pane is used to access attribute 
lists, and insert attributes and annotations. Other functions may be incorporated here. 
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FIGURE 35.1 Entry Editor 
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35.5 Day-To-Day PROJECT MANAGEMENT 


35.5.1 The Role of the Project Manager 


The project manager plays an essential role in the day-to-day management of the pro- 
ject. This can combine a number of project responsibilities: 


headword list integrity; 

creating and scheduling the batches (textflow); 
timing of text revision; 

project administration; 

day-to-day liaison with the lexicography team; 
accounting and financial reporting. 


When compilation begins, the management team will already have a cautious estimate 
of how many hours of lexicographic and editorial time they can allow for each stage of 
the dictionary project. The number they come up with is a prudent distillation of all 
the factors considered at the planning stages. Recruiting the lexicography team is also 
a planning phase task, Familiarization and training will take place between the final- 
izing and loading of the data into the editing environment and the scheduled start of 
entry writing. This is a busy stage for project managers, as even the most talented and 
experienced lexicographers may take time to fully digest the project specific aspects of 
the style guide and DTD, and immediately after orientation progress can be bumpy. The 
early weeks serve as a shake-down for the project systems and the administrative load 
increases as the team of lexicographers settle to their task. Extra support is commonly 
required on the IT and editorial fronts. Some of this is about on-going training, and 
some about fine-tuning style guide issues taking into account early feedback from the 
lexicography team. Down-time for any reason may call for a reshaping of the schedule to 
bring progress in line with contractual deliverable timetables and the impact of delay on 
budget spending is a fundamental consideration. From the outset, data on compilation 
output rates needs to be quantified and compared to estimates. This is where managers 
will first see any deviation from estimated progress and can act promptly. 

Dictionary design and system engineering are not covered here, but these are pro- 
cesses that would be defined at the planning stage and costs largely agreed. ‘The editorial 
phase or phases of the project are where the prospect for commercial viability comes 
under strain. A large chunk of the overall budget will already have been spent during the 
planning stage on acquiring the linguistic and computational resources. In this context 
the budget for lexicography is the big wild card. Financially, outgoings should stabilize 
during the editorial phases but the potential for overrunning either on time or budget 
here is the biggest variable. Many a dictionary project has floundered unfinished due to 
unrealistic planning in the first instance. 
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35.5.2 Text Management 


Textflow is the journey of each entry and of the dictionary as a whole from its initial 
form to the finished text. It is up to the project manager to structure the progress of the 
data through the editing environment and maintain the integrity of the headword list 
there. A DWS facilitates the monitoring of the entries and allows comparison of versions 
of entries as well as treatment of specific types of entry or classes of information. Using 
the XML structure in place, the database can be interrogated for both editorial informa- 
tion relating to the headword list, and for management statistics such as the scheduling 
ofa sub-task, or the proportion of text that is waiting for editorial feedback. 

The headword list is batched and sent for compilation: batches can be created accord- 
ing to entry type or complexity using the XML attributes created at the classification 
stage, matching this against the time value derived from the estimated number of lexical 
units in a particular entry (see Atkins and Grundy 2006). The main advantage of using 
this approach is that it allows managers complete flexibility to make the most of the var- 
ied skills and working patterns of the changing team. This information is also used later 
as the basis for project reporting. 
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FIGURE 35.2 Search screen in the DPS 
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Figure 35.2 shows batch number 13 comprising entries from designated levels 1 to 
g. The data record indicates that the first entry is ‘locked; that is to say, is currently being 
compiled by a member of the team. Information is also given regarding the extent of the 
entry in characters, words and meanings along with the time value estimated prior to 
compilation. 

Figure 35.3 shows batches from a section of the letter ‘M’ at levels 10 to 12: the steps in 
the workflow definition (in this case there are three) are visible for each batch, together 
with the name of the lexicographer or editor assigned to the batch, the percentage of the 
work that is complete in each step, and once again the time value of the batch. 

In terms of scheduling text revision, using the software in the first instance to locate 
each entry requiring editing means that editors and managers can quickly identify defi- 
ciencies and overruns, helping to minimize subsequent edits of the entries (see Convery 
et al. 2010). XML can be harnessed to drill down through the returned data both hori- 
zontally (A-Z) and vertically (entry type) for quality control and to indicate where any 
problems may lie. This puts the emphasis on ‘smart’ editing and developing the respon- 
siveness of the senior editors and project managers to issues thrown up by the lexicog- 
raphy. Typically the headword list will not be static and this is dangerous territory both 
editorially and for budget-holders: there will be additions as well as exclusions as the 
project progresses, so maintaining the headword list in the dictionary database online 
improves communication between lexicographers and editors as well as helping to 
deliver robust time and budget management. 
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FIGURE 35.3. Workflow screen in the DPS 
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35.6 DAY-TO-DAY EDITORIAL MANAGEMENT 
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35.6.1 Overview 


We have seen that the project manager's role in handling all matters concerning textflow, 
schedules, and budgets leaves the chief editor largely free to focus on editorial concerns. 
This does not mean, however, that the fields of action of the chief editor and the project 
manager are neatly demarcated. Between them they will discuss which lexicographer 
will be best suited to a particular task, how to redress a progress shortfall, and schedule 
text revision. In present-day dictionary management conditions, the likelihood is that 
the team will be geographically scattered; on the recent DANTE project, this was the 
case for the present writers, where all the contributors and editors were ‘remote’ work- 
ers. Developing project wikis or intranets and using collaborative document sharing are 
all good strategies for maximizing communication between team members. 
The chief editor’s primary responsibilities, once the project is up and running are: 


« reading batches of entries according to the lexicographer and supplying regular 
feedback; 

reading batches according to the batch type as defined in the pre-project stage and 
supplying feedback; these could include, for example, function word entries, ency- 
clopaedic entries, and very complex entries; 

e running search strings across a defined stretch of the XML database as this is com- 
pleted and correcting errors on the spot; 

using the DWS to monitor and control extent. 


35.6.2 Checking Entries According to the Lexicographer 
and Supplying Feedback 


Feedback is time-consuming but has to be kept up throughout the project. A chief edi- 
tor running an in-house project is able to gather the editorial team together on a regular 
basis, normally once a week, in order to go over lexicography and style issues. When 
working at a distance, time must be found to keep regular contact with every lexicog- 
rapher, on an individual basis. Reading through a batch compiled by a particular lex- 
icographer gives the chief editor a feel for the person’s work and makes it possible to 
spot systematic errors or weaknesses. In the case of novice or relatively inexperienced 
lexicographers, the chief editor will be able to spot cases where an area of the style guide 
has been misunderstood—sometimes because it has not been adequately explained. 
Further training may be required at this point. After the project has been running for a 
few months, it may be helpful to issue a supplement to the style guide in order to clarify 
such problems. A mentoring system can be put in place, whereby the more experienced 
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lexicographers can offer support to novice or new members of the team, providing edi- 
torial guidance and monitoring their work. 


35.6.3 Using Search Strings 


A good DTD will enable extremely fine-tuned searches across the database. On the 
DANTE project, we were running almost 200 search strings ona systematic basis before 
passing each batch of text. Once such a system has been set in place, it cuts down very 
significantly—and may even replace—the formerly onerous task of proofreading. The 
chief editor will be able to run a search string in order to detect a specific anomaly and 
obtain a list of the entries requiring further revision. An example of this is a search for 
empty meaning tags in the letter “T’, as shown here. The results will show cases where the 
lexicographer has omitted to type in the meaning of the LU. 


%<MEANING:!(.+),<hwd:(4#[t].*) 


Another type of search is rather more time-consuming, though much less so than 
undertaking several passes of linear text proofreading. This involves obtaining a list of 
entries bearing a particular tag with particular contents, for example, all entries that 
have the register marker vulgar as opposed to v.informal, which is not always a simple 
distinction. A scan of the list should reveal any that should not be so marked. 

When the dictionary is nearing its final stages, the skills of the more experienced lexi- 
cographers can be pulled in to help with these tasks. 


35.7 THE TEAM 


35.7.1 The Lexicographers 


As we have seen, it would be unusual nowadays for a team of lexicographers to work 
together as permanent staff in one physical location. A much more likely scenario is that 
the team will be made up of part-time, sub-contracted contributors working remotely 
and in touch with managers almost exclusively by e-mail. The total number of contribu- 
tors on the team is partly a function of the overall timetable, but will also reflect the 
different tasks required for the particular type of dictionary: specialist terminology, 
phonetics, grammar information, usage notes, illustrations, etc. 

It is possible for a team to be too big of course; resources spent on training and later 
administration, as well as the impact on team communication are relevant factors. It 
can also be difficult to induct new personnel once a project is at an advanced stage, 
so there will need to be some preparedness on the part of project managers to cope 
with this. 
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35.7.2 Training 


A team starting out on a new project will consist at least in part of people who already 
have considerable experience in lexicography. However, we can assume that, either at the 
outset or along the way, training of relative novices may be required. Either way, itis clear 
that everyone will need initial orientation of some sort, followed by ongoing feedback. 
The way that training is organized will obviously vary greatly according to the nature of 
the project. Sue Atkins has frequently put the essential distinction between the differ- 
ent types of project thus: (i) the two-process monolingual dictionary: analysis, synthesis 
and (ii) the three-process bilingual dictionary: analysis, transfer, synthesis (Atkins and 
Rundell 2008: 98-9). In both cases, the analysis of the corpus creates a ‘framework’ and 
in both cases the synthesis stage pulls together the final entry. In the case of a bilingual 
dictionary, the transfer phase puts in place a carefully judged translation of the frame- 
work. For our purposes here, we need not distinguish between the general monolingual 
dictionary, the learner’s dictionary, and the special purposes dictionary, although it is 
clear that in specific areas such as definition writing, the guidelines will be different. 


35.7.2.1 The Training Environment 


Our experience has demonstrated the benefits to be gained from setting in place an 
initial training session in one venue. This makes for invaluable team bonding, as well 
as fruitful discussion between team members. A large new project may require at least 
three days of intensive training, a smaller project less. It may also be necessary, or at least 
desirable, to get the team together again three to four months later so that problems 
encountered thus far can be aired and discussed. Following feedback and agreement at 
this point, the style guide should be completely finalized. 

During induction, access will be required to the chosen DWS, the Corpus, and the 
CQS. The team members should have in advance: 


e the training program; 

¢ the project outline; 

the proto style guide; 

¢ abrief outline of the computational environment. 


35.7.2.2 Training Content 

The aim will be to demonstrate the software and how to use it in relation to the project 
specifications and style guide. In the practical sessions, the training team can then circu- 
late amongst the trainees and help them to resolve problems hands-on. 

35.723 Analysis: Monolingual Framework Building 


Training here will include at a minimum: 


« explanation of the aims of the project, the nature of the product of the project and 
what its uses will be; 
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introduction to the style guide and explanations of the linguistic approach to all the 

elements of the macro- and microstructure of the dictionary; 

introduction to the dictionary software and the DTD: how the various entry ele- 

ments are encoded and how to work with them in the DWS; 

* introduction to the Corpus and the CQS: how the corpus will be used by the 
lexicographers; 

¢ practical work; 

¢ project administration and IT support. 


Each of the above will be expanded according to the nature of the project. The broad list 
of extended topics will be extended to include more fine-tuned lexicographical topics 
such as sense disambiguation, choosing corpus examples, what is an MWE, what is an 
idiom, how labels are used and so forth. 


35.7.2.4 Working with a Framework: Monolingual Project 


Here editors will be working on constructing final, fully publishable dictionary entries, 
using an existing framework. There are several possible scenarios. The editors may 
themselves have participated in the analysis phase and have moved on, in succession or 
in parallel to the final editing phase. Alternatively, they may be working on a framework 
such as DANTE that has been independently originated. In the former case, the lexi- 
cographers will need to be thoroughly briefed on the final dictionary policy and what it 
involves: definition style, use of labels, grammatical information, and so on. All this will 
have been laid out in the style guide but additional editorial workshops as a project gets 
to this stage are indispensable. Lexicographers working with a framework with which 
they are completely unfamiliar will need more intensive training, closely akin to what is 
described above, although in this case the style guide will lay down detailed guidelines 
for using the framework. Lexicographers will, however, still need thorough training in 
the text structure and how to use it. They will learn to use the various building bricks 
of the framework in order to piece together a final dictionary entry. Since they will be 
working from a corpus-based framework, the use of the corpus may be minimal. 


35.7.2.5 Working with a Framework: Bilingual Project 
35.7.2.5.1 Transfer 


Translators working on a dictionary project will again need to understand the structure 
of the text and the principles according to which the framework has been constructed. 
Group training sessions are again to be recommended and will include, as for the analy- 
sis phase above, an explanation of the aims of the project and of its management struc- 
ture. In particular, they will need training in specific aspects of dictionary translation 
(see Atkins and Rundell 2008: 465-83). 


¢ how the translation fits into the framework structure and how the XML tagging is 
organized for use by the translator; 

¢ what is a ‘good’ dictionary translation and how translating for a dictionary differs 
from translating free text; 
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* how to use the metalinguistic information given in the text (domain labels, register 
markers, and so on); 

criteria for determining how much of the framework material to translate; 

criteria for translating corpus examples; 

using the target-language corpus as an aid to translation; 

again, time should be allowed for practical sessions. 


35-7.2.5.2. Synthesis: Building the Bilingual Entry 

Here it is safe to assume that the editors building the final bilingual entries for the dic- 
tionary will have worked either on the analysis stage or the translation stage of the pro- 
ject and thus will be entirely familiar with the structure of the framework and the aims 
of the project. Training points include: 


the user profile: how the dictionary is to be used and by whom will influence con- 
siderably the way the entries are constructed (see Atkins and Rundell 2008: 486-9); 
entry structure and using the final editing style guide: the final dictionary entries 
will certainly be organized differently and much more concisely than the frame- 
work entries; 

the role of translated examples in the dictionary and how to modify framework 
examples; 

practical sessions. 


e 


e 


35.8 CONCLUSION 
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This chapter inevitably constitutes something of a race through the many complex 
issues involved in managing a dictionary project. We have tried to give an overview of 
these, focusing on recent technological advances that can now be deployed to make the 
processes, lexicographical and logistical, more efficient. But the changes do not stop 
with efficiency. There can be no doubt that they have made—and continue to make— 
the working life of Johnson's ‘harmless drudges’ richer and more satisfying, It remains 
of course to be seen whether, as technology in the field continues to advance by leaps 
and bounds, the lexicographer will become an endangered species—but that is another 


matter... 
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36.1 INTRODUCTION 


In his ‘Guides to tomorrow’s English’ (1998b), Tom McArthur considered English dic- 
tionaries of the past, present, and future. “Today’s English; starting from the early nine- 
teenth century, reflected the influence and range of the language resulting from the 
growth in international trade, the Industrial Revolution, the expansion of the British 
Empire and the increasing power of the United States. In this period English lexicog- 
raphy was primarily associated with just three locations: Oxford (Oxford University 
Press), Edinburgh (Chambers), and Springfield, Massachusetts (the Merriam publish- 
ing company). 

The period of ‘tomorrow’s English’ had already begun for McArthur, however, with 
the increase in the use of English as an international language, and the development 
of English dictionary resources specifically for foreign language learners. McArthur 
identified eight pragmatic developments associated with the English dictionaries 
of ‘tomorrow’ Alongside globalization of the market and localization to meet the lan- 
guage learning needs of a particular country or group of countries, thematization marks 
a move away from alphabetical formats, bilingualization and semi-bilingualization 
involve the inclusion of data from languages other than English, and nationalization and 
regionalization mean that dictionaries are being developed outside Britain and the USA, 
to cater for the interests and needs of users who may or may not be ‘traditional native 
speakers. The eighth pragmatic development McArthur identified, electronicization, 
began with the use of computers to assist the lexicographical process, and at the time 
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how to use the metalinguistic information given in the text (domain labels, register 
markers, and so on); 

criteria for determining how much of the framework material to translate; 

criteria for translating corpus examples; 

using the target-language corpus as an aid to translation; 

again, time should be allowed for practical sessions. 


* 


35.7.2.5.2. Synthesis: Building the Bilingual Entry 

Here it is safe to assume that the editors building the final bilingual entries for the dic- 
tionary will have worked either on the analysis stage or the translation stage of the pro- 
ject and thus will be entirely familiar with the structure of the framework and the aims 
of the project. Training points include: 


e 


the user profile: how the dictionary is to be used and by whom will influence con- 
siderably the way the entries are constructed (see Atkins and Rundell 2008: 486-9); 
entry structure and using the final editing style guide: the final dictionary entries 
will certainly be organized differently and much more concisely than the frame- 
work entries; 

+ the role of translated examples in the dictionary and how to modify framework 
examples; 

practical sessions. 


35.8 CONCLUSION 
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This chapter inevitably constitutes something of a race through the many complex 
issues involved in managing a dictionary project. We have tried to give an overview of 
these, focusing on recent technological advances that can now be deployed to make the 
processes, lexicographical and logistical, more efficient. But the changes do not stop 
with efficiency. There can be no doubt that they have made—and continue to make— 
the working life of Johnson's ‘harmless drudges’ richer and more satisfying. It remains 
of course to be seen whether, as technology in the field continues to advance by leaps 
and bounds, the lexicographer will become an endangered species—but that is another 
matter... 
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36.1 INTRODUCTION 


IN his ‘Guides to tomorrow’s English’ (1998b), Tom McArthur considered English dic- 
tionaries of the past, present, and future. “Today's English, starting from the early nine- 
teenth century, reflected the influence and range of the language resulting from the 
growth in international trade, the Industrial Revolution, the expansion of the British 
Empire and the increasing power of the United States. In this period English lexicog- 
raphy was primarily associated with just three locations: Oxford (Oxford University 
Press), Edinburgh (Chambers), and Springfield, Massachusetts (the Merriam publish- 
ing company). 

The period of ‘tomorrow's English’ had already begun for McArthur, however, with 
the increase in the use of English as an international language, and the development 
of English dictionary resources specifically for foreign language learners. McArthur 
identified eight pragmatic developments associated with the English dictionaries 
of ‘tomorrow. Alongside globalization of the market and localization to meet the lan- 
guage learning needs ofa particular country or group of countries, thematization marks 
a move away from alphabetical formats, bilingualization and semi-bilingualization 
involve the inclusion of data from languages other than English, and nationalization and 
regionalization mean that dictionaries are being developed outside Britain and the USA, 
to cater for the interests and needs of users who may or may not be ‘traditional native 
speakers. The eighth pragmatic development McArthur identified, electronicization, 
began with the use of computers to assist the lexicographical process, and at the time 
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of McArthur’s article had not yet fully overturned the methods of storage and retrieval 
long associated with print dictionaries. 

All the developments on McArthur’s list reflect our shifting relationships with the mul- 
tiple language varieties and the multiple environments in which dictionary users oper- 
ate. However, it is undoubtedly electronicization that has had the greatest impact since 
1998, influencing all other aspects of dictionary creation and use. CERN opened up the 
internet as a possible site for e-dictionaries in 1993, but in the 1990s lexicographers’ focus 
was largely on the collection and exploration of digital language data, and in 1998 there 
were still only about 400 English dictionaries on the World Wide Web (Li 1998: 21). Most 
e-dictionaries were still being distributed on CD-ROM or on standalone mobile devices 
(pocket electronic dictionaries). In the early 2000s, however, internet access gradually 
became faster and more reliable as high speed broadband connections became availa- 
ble. Asa result, thousands more digital versions of print dictionaries went online. At the 
same time wiki software became an open source tool, leading to the rapid expansion of 
‘collaborative lexicography’ where dictionary information is created and edited by users 
(Nesi 2009; Krek 2011; Meyer and Gurevych 2012). In this new environment Wi-Fi con- 
nected mobile devices are overtaking pocket electronic dictionaries containing licensed 
dictionary content, and publishers are having to rethink their entire marketing strategy. 
The new web technology seems to promote a give-away culture where files are shared, 
anonymous amateur editors construct reference works, and open source operating sys- 
tems are created and distributed free of charge (Hall 2008: 206). Yet this same technology 
is capable of deriving ‘big data’ with enormous commercial potential from social media, 
internet archives, and internet search indexing. Whether such data can be harnessed to 
the advantage of dictionary publishers and lexicographers remains to be seen. 

Recently, a survey conducted by Miiller-Spitzer et al. (2011) asked English and 
German participants to rate ten aspects of online dictionary usability. The results indi- 
cated that reliability of content, clarity, up-to-date content, and speed were regarded as the 
most important features. Long-term accessibility seemed to be of medium importance 
to users, and there was far less enthusiasm for links to other dictionaries, adaptability, 
suggestions for further browsing, and multimedia content. The researchers noted that 
‘the classical criteria of reference books (e.g. reliability, clarity) were both ranked and 
rated highest, whereas the unique characteristics of online dictionaries (e.g. multime- 
dia, adaptability) were rated and ranked as (partly) unimportant’ (Miiller-Spitzer et al. 
2011: 207). 

Users’ attitudes towards online dictionaries could be changing, however. A second 
survey (Koplenig 2011) found that users were more inclined to value multimodal and 
user-adaptive interfaces once these features had been explained to them more clearly. 
Moreover Kilgarriff (2005) notes the move away from ‘status symbol monolingual dic- 
tionaries, and Rundell (2011) claims that amongst digital native users (in their late teens 
and early twenties) dictionaries are no longer regarded as authoritative in the same way 
as before. 

Dictionary portals such as OneLook and Allwords now include links to other diction- 
aries as standard, serving as metasearch engines across multiple sources which might 
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contain the information the user requires. The OneLook site, for example, indexes 1,062 
dictionaries of varying types, ages, and reliability, including bilingual dictionaries and 
dictionaries for learners of English. More radically, portals may supplement their dic- 
tionary resources with suggestions for further browsing in the form of examples from 
media websites (Nesi 2012) and/or pedagogical materials (Campoy Cubillo 2002). 


36.2 COMPUTER GENERATED 
LEXICOGRAPHICAL INPUT 


Grefenstette (1998) asked whether there would still be lexicographers in the year 3000. 
He listed the four steps undertaken by the ideal lexicographer, as identified by Kilgarriff 
(1992): 


gather corpus of citations for a given word; 

. divide the citations into clusters; 

decide why the cluster members belong together; 
code the conclusions into a dictionary definition. 


yor 


ay 


Steps 1 and 2, Grefenstette argued, could be achieved by computers, but steps 3 and 4 
would remain the work of humans because they entail ‘drawing distinctions and con- 
trasts between shared experiences and expectations, explaining what makes this group 
different from other groups that the human user knows’ (1998: 38). 

Rundell (2009) reflects on Grefenstette’s predictions in terms of the way technol- 
ogy has decreased the need for intervention in the lexicographical process, and how 
the advent of large corpora mean that dictionaries are no longer solely concerned with 
words, but also with language systems and syntagmatic networks. The four-step lexico- 
graphical process has now become too simple, a view that Leroyer (2011: 122) seems to 
support when he argues that lexicography should no longer be regarded as a subset of 
applied linguistics, but as ‘a unique discipline at the crossroads of social and information 
sciences and technology’. 

In 2002 Esposito envisaged a ‘new class of lexical applications... based on machines 
talking to machines. These applications would eliminate the need for human media- 
tion, but would still draw on the dictionary databases that human lexicographers had 
produced. An example of this type of application is the now defunct Casey’ Snow Day 
Reverse Dictionary, which used ‘n-gram analysis’ (‘a method of matching documents 
based on the statistical similarity of occurrences of ... combinations of letters’) to match 
a meaning provided by the user to entries in the Hypertext Webster Interface (Nesi1998). 
A modern equivalent of Casey’s Snow Day Reverse Dictionary is the reverse dictionary 
provided by Onelook, which searches the full text of hundreds of online reference sites to 
find definitions conceptually similar to the words the user types in. 
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As part of this trend away from alphabetical searches and towards meaning-related 
queries—thematization, as predicted by McArthur (1998b)—e-dictionaries are also 
using large lexical databases known as ‘wordnets’ to provide onomasiological search 
routes. Wordnets group words into interlinked sets of cognitive synonyms (Fellbaum 
2006) and were originally developed for use by artificial intelligence systems. The com- 
mercial online dictionaries a2zDefined, Bee Dictionary, Memidex, and Wordnik all 
draw data from the English WordNet developed at Princeton University, and the online 
Danish dictionary Den Danske Ordborg (Trap-Jensen 2010) offers searches for related 
words through DanNet, the Danish wordnet. Wordnets are developed by human teams 
with reference to dictionaries produced by human lexicographers, but the informa- 
tion that e-dictionaries draw from wordnet sources is not manually edited. Trap-Jensen 
(2010) compares the thesaurus function in Den Danske Ordborg with that of the 
Macmillan Online Dictionary, which is compiled by lexicographers. He finds strengths 
and weaknesses in both systems. A wordnet can generate too many options, whereas 
manual editing can select those candidates which are most likely to be relevant to the 
user. On the other hand wordnets can identify relevant semantic links which cut across 
categories in a traditional manually-constructed thesaurus. 

Language problems can also be addressed by automatically interrogating web corpora, 
without human intervention at any stage in the process. Whilst it might not be possible 
to derive old-style dictionary definitions directly from corpus data, algorithms are being 
developed to identify definition-like explanatory sequences within large collections of 
text. Dictionary sites such as Wordnik mine corpora for sequences which provide infor- 
mation about word meaning, rather than simply examples of word use (McKean 2011). 
These sequences are imported to Wordnik in place of definitions, as the Wordnik team do 
not define words themselves, and do not accept definitions contributed by users. 

Automatically extracted explanations will be adequate in some consultation con- 
texts but not in others, depending on the task and levels of user expertise and language 
knowledge. McKean (2011) admits that data mining techniques are not useful as a means 
of uncovering word etymology, for example, because of the unreliability of the folk ety- 
mologies to be found in non-specialist texts. 

The inclusion of the latest new words may not greatly improve the usefulness of a 
dictionary, but is important from a marketing perspective, as Rundell and Kilgarriff 
(2011) note, and there is clearly a public appetite for information about words reflecting 
new phenomena and societal change. The web analytics site Google Insights for Search 
(<www.google.com/insights/search/#>) for June 2011 to June 2012 reveals that many 
of the most frequent searches for the term “English dictionary’ led to articles about the 
acceptance of new terms in well-known authoritative publications, for example ‘mum- 
preneur’ (Cameron 2011) in the Collins English Dictionary, and ‘LARPing, ‘scratchiti’ 
(Taylor 2012), and ‘squeezed middle’ (Zafar 2011) in the Oxford English Dictionary. 
Nowadays a new word can be added to an e-dictionary in a fraction of the time it takes 
a lexicographical team to compile and publish a print dictionary entry. Rundell (2011) 
contrasts modern e-dictionary practices with the flow chart on the Oxford Dictionaries 
site (undated) showing the elaborate and increasingly outdated process by which a 
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word is considered for admittance in an Oxford Dictionary. Human web-page editors 
can now by-pass much of this process whilst still setting criteria for word inclusion. The 
collaborative e-dictionary Wiktionary, for example, allows contributors to add new 
entries, but requires that neologisms should be attested through widespread use in ‘a 
well-known work’ or ‘permanently recorded media, conveying meaning, in at least three 
independent instances spanning at least a year’ 

However, direct links to news databases and social media sites also mean that new 
words can be identified and analysed computationally, and rapidly incorporated into some 
e-dictionaries with little or no editorial intervention. This may lead to tension between 
the demands for up-to-date content and reliability, of course. Systems for the automatic 
extraction of neologisms have to overcome many problems, as Halskov and Jarvad (2010) 
point out, because expressions that the software will flag as new may actually be trans- 
parent, transient, and/or idiosyncratic, and therefore lexicographically insignificant. 
Without humans to assess the lexicographical worth of potential new entries, e-dictionar- 
ies can rapidly become populated with words and expressions that have no real currency. 
Problems resulting from the lack of human intervention are particularly noticeable in 
the less prestigious varieties of ‘alternative’ bilingual e-dictionary which are popular with 
English language learners in East Asia. Nesi (2012) identifies in such dictionaries archa- 
isms and nonce formations that are not differentiated in any way from items essential to an 
English language learner. It is sad to think of users wasting their time memorizing vocabu- 
lary that is relatively worthless to them from a communicative perspective. 

Some bilingual e-dictionary sites also supplement their dictionary entries with auto- 
matically generated illustrative text. For example the Jin Shan Ci Ba, an enormously 
influential e-dictionary in mainland China, works with the machine translation 
device Jinshan Kuaiyi (Nesi 2012). Mair (2007) blames this system for the production 
of ‘absurdly crude English mistranslations in bizarrely inappropriate contexts. A fur- 
ther problem is the incidence of meaningless machine generated sentences, originally 
posted on internet discussion sites to fool filters into accepting spam messages, but now 
sometimes automatically incorporated in online dictionary sites. Postings to USENET, 
the Internet discussion system, were used to supplement entries in the Doosan Dong-a 
Prime English dictionary on the Daum South Korean web portal, for example. Some of 
the postings were genuine, but others, such as “What did Francis arrive the cup before 
the dark coffee?’ were simply random sequences which make no sense (Nesi 2012: 367). 


36.3 DIGITAL MIGRATION 
AND THE FRAGMENTED 
DICTIONARY MARKET 


Of course, automatic translation and the automatic extraction of lexicographi- 
cal material are techniques that bring commercial benefit to the companies that run 
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e-dictionary sites. They add to the size of the dictionaries on offer (size being a simplis- 
tic but common means of evaluating dictionary worth), and they give the impression 
that the dictionary material is up-to-date, and that the site is technologically advanced. 
Most significantly, automation reduces development costs, just as a web platform 
reduces the costs of production and distribution, which is why companies such as 
Google and Amazon have embraced the move from print to digital and have persuaded 
the public to follow suit—Amazon started to sell more e-books than print books in 
May 2011 (Krek 2011). 

It seems that the print-to-digital migration has particularly affected reference mate- 
rials. People typically consult maps, encyclopaedias, and dictionaries while they are 
doing something else, for example whilst driving, writing, reading, listening, or con- 
versing, and under these conditions the electronic format can improve accessibility and 
ease of consultation (‘usability’ in Laufer and Kimmel’s (1997: 362) terms). Thus paper 
maps have largely given way to e-maps delivered via satellite navigation systems, with 
the result that old-style cartography companies have shrunk or closed down, while a 
new e-cartography industry has grown up (Parish 2004). The print edition of the 
Encyclopaedia Britannica ceased production in 2010, and most dictionary publishers 
now accept that print-based dictionaries will also largely disappear as content migrates 
to e-dictionaries of various types. 

Levine (2001) realized that digital migration was responsible for the decline in sales of 
encyclopaedias, but commented optimistically about the commercial future of diction- 
aries, at least English language ones: ‘a boom in English usage and commerce fostered by 
the World Wide Web ...seems to be having just the opposite effect on lexicography as 
it did on “encyclopediography” : By 2003, however, he noted that the American mono- 
lingual dictionary business was showing little growth. Electronic dictionaries had been 
marketed alongside print dictionaries, but this had not resulted in an increase in over- 
all sales because e-dictionaries were being bought instead of hard-copy dictionaries, or 
were being bundled with print editions (Levine 2003). Esposito (2002) was even more 
gloomy about the outlook for commercial dictionary publishers. Given the increasing 
availability of free e-dictionaries, ‘all current attempts (except Microsoft’s) to put dic- 
tionaries into electronic form are nothing more thana limp attempt to extend the life of 
a failing business model: 

Some of the publishers’ responses to this situation echo McArthur’s predictions in 
1998 regarding the regionalization, localization, and electronicization of tomorrow's dic- 
tionaries. Esposito foresaw some possibility for growth in the niche markets for sophis- 
ticated or specialized lexicographical products, for example dictionaries of obscure 
languages and dialects. Kilgarriff (2005) agreed that there was scope for the marketing 
of smaller, more specific products to a world-wide customer base: ‘dictionary publishing 
is undergoing the same transformation as many other markets with the advent of the 
internet: the market fractures, and where there were a small number of products selling 
to millions, there are now millions of products—selling far smaller numbers—to bil- 
lions: Similarly Rundell (2011) talks of ‘a more fragmented landscape’ moving towards 
functionally diverse products for many different types of user. 
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This fractured market may offer some hope for further developments in dictionary 
content, continuing the progress made in the last decades of the twentieth century. It 
seems that the changing business environment favours technical rather than lexico- 
graphical innovation, with many new developments in interface design and automatic 
data extraction but fewer developments addressing language learners’ information 
needs, Small but pedagogically innovative dictionary ventures are, however, being 
undertaken by academics financed through research funding. A number of such pro- 
jects are described in the Proceedings of eLex 2009 (Granger and Paquot 2010b) and 
the Proceedings of eLex 2011 (Kosem and Kosem 2011). They include the Louvain EAP 
Dictionary, a project at the Université Catholique de Louvain (Granger and Paquot 
2010c), Lang Yeast, a dictionary to help biologists writing in English being devel- 
oped at the Université Paris Diderot (Volanschi and Kubler 2010), and Diccionario de 
Aprendizaje del Espariol como Lengua Extranjera (DAELE), a Spanish learner's diction- 
ary from Pompeu Fabra University in Barcelona (Mahecha Mahecha and DeCesaris 
2011; Renau and Battaner 2011). The developers of these dictionaries aim to apply lin- 
guistic and pedagogical theory to lexicographical problems, drawing on the capabilities 
of the latest technologies. They are free to explore the effects of new search routes and 
defining methods because they do not have to sell their products, but at the same time 
resources are limited, and progress is therefore often slow. 


36.4 REVENUE SOURCES 


In the newbusiness environment a few prestige e-dictionaries such as the Oxford English 
Dictionary can be sold to universities and libraries, and a few niche e-dictionaries can 
be sold to individuals, but for the most part people expect to use e-dictionaries for 
free. It continues to be common practice for publishers to offer e-dictionary mate- 
rial as a means of adding value to their other products. Buyers of the Collins Cobuild 
Advanced Dictionary in book form, for example, can use myCOBUILD, an online ver- 
sion enhanced by the addition of specialist words. Similarly the Longman Dictionary 
of Contemporary English Online is a free online version of the CD-ROM, but users are 
urged to buy the full CD-ROM version to hear the pronunciation of 88,000 example 
sentences. Publishers may feel that they have to bundle electronic products in this way 
in order to maintain a competitive edge, but it is unclear whether it influences users’ 
choice of dictionary title, or results in any additional revenue. Morse (2008) regards all 
free e-dictionary access as a form of bundling, but considers that ‘so far, no bundled 
dictionary, whether with browser, search engine, operating system, or e-book reader yet 
looks likely to have a major impact on the dictionary business. 

It is also unclear whether the majority of print dictionary users really benefit from 
bundled e-content. Nowadays people continue to use print dictionaries in contexts 
where they do not have access to an electronic device, either because of school rules, 
or because of poor internet access or lack of equipment. Boonmoh and Nesi (2008), for 
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example, surveyed 1,211 Thai university students who had been recommended by their 
teachers to buy the Longman Active Study Dictionary in book form. Only 28 per cent of 
respondents claimed to own a monolingual dictionary on CD-ROM, even though the 
Longman Active Study Dictionary CD-ROM was included with the book, attached to the 
inside cover. Most of these students did not own computers, so the CD-ROM was use- 
less to them and was ignored. 

Whilst e-dictionary publishers may not be paid directly by the end-user, they can 
derive revenue from licensing deals with manufacturers and commercial websites. 
Standalone pocket electronic dictionaries and web-based dictionary download sites 
provide e-dictionaries from many sources, bilingual and monolingual, local and global 
(e.g. emanating from prestige publishing houses in Britain and the USA). Users may 
typically opt to consult the local bilingual source, but the inclusion of one or two prestig- 
ious sources adds credibility to the product and may help boost its sales. Oxford diction- 
aries are particularly highly regarded; the hardware companies AOnePro, Canon, Casio, 
Franklin, Seiko, Sharp, and Sony have all licensed content from Oxford University Press 
(Nesi 2009). 

Standalone pocket electronic dictionaries are now being superseded by 
internet-enabled devices. Tuteja (2011), for example, reviews a recent Casio model 
released in India (the EW-B2000C) and wonders whether it is worth the price: ‘Can't 
I read (or even listen to) speeches on my internet-enabled smartphone or laptop? 
Can't I download dictionary apps, that too for free?’ He remains unconvinced of its 
value, although he concedes that the model ‘might be of some use to writers and 
students who don’t have access to the Internet all the time, or find it a little bother- 
some to locate and launch an app on the phone or PC only to look up a definition. In 
Europe interest in pocket electronic dictionaries may be growing, however, even as 
they lose ground in the Indian subcontinent and the Far East where they were orig- 
inally most popular. In 2010-11 Casio Europe collaborated with researchers at the 
University of Osnabriick to conduct what they claim to be the first scientific study in 
Europe into the effects of pocket electronic dictionaries on learning (Ludewig et al. 
undated). A longitudinal project at the University of Wuppertal is also introducing 
pocket electronic dictionaries in a number of German schools, as a motivating alter- 
native to dictionaries in book form (Diehr undated). 

A further source of revenue is the licensing of e-dictionary content for use with 
e-books and media websites, so that readers can access definitions of unknown words 
directly from the text. The British version of Amazon Kindle comes with the New Oxford 
American Dictionary and the Oxford Dictionary of English pre-installed, for example, 
and the Indian online newspaper DNA (Daily News and Analysis) uses the Macmillan 
English Dictionary for Advanced Learners as a ‘plugin’ (Rundell 2011). 

As predicted by de Schryver (2003) and Parish (2004), such applications mark 
a move away from the dictionary as a standalone product, and open the way to cus- 
tomization by licensees. This may involve adjusting layout and functionality, adapting 
content, and/or supplementing it with material dynamically generated from internet 
resources. As e-dictionaries are cut up and recompiled, content appears and disappears 
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without warning and without trace. We lose standard bibliographical information such 
as the editor’s name and publication date, and we can no longer refer to a rationale 
and content overview of the kind traditionally provided in the front matter of a print 
dictionary. 

All this has serious implications for dictionary evaluation and dictionary skills train- 
ing. Customized e-dictionaries are difficult to review because of their instability and 
lack of front matter. In turn the lack of scholarly description and evaluation makes it 
difficult for teachers and students to interpret lexicographical content, and to choose 
which dictionary sites to use. Under these circumstances, perhaps the best kind of edu- 
cation for dictionary users is one which encourages a critical stance, and helps to dispel 
blind faith in the authority of all works entitled ‘dictionary. Dictionary websites which 
invite users to comment and collaborate have the potential to support dictionary con- 
tent evaluation, as can be seen from the active critical and scholarly discussion on the 
Leo Dictionary site, for example (Nesi 2012: 368-9). 

Apart from selling dictionary content to the hardware manufacturers and to commer- 
cial websites, publishers can also sell advertising space on their own e- dictionary pages. 
As Morse (2008) points out, ‘In the online world... we don't sell the dictionary; we sell 
the eyeballs that look at the dictionary. Morse claims that the free Merriam-Webster 
dictionary website attracts a great deal of advertising revenue, and Lannoy (2010) and 
Caruso (2011) both note that dictionary sites are attractive to advertisers because of the 
time and attention dedicated to dictionary consultation. Kilgarriff (2009) is somewhat 
less optimistic, arguing that advertising works best for dictionaries based in the USA, 
and those which already have a strong brand name. He notes that in the UK, Cambridge 
Dictionaries Online have been the most successful in making money from advertising. 
Cambridge was an early adopter of the advertising strategy, however, and it could take 
time for publishers who started later to reap the same rewards, and to make enough to 
sustain a lexicographic team. 

Rundell (2011) compares the Macmillan site, which at that time had relatively few 
advertisements (just a header and a column in the right margin of the webpage), to the 
noisier Cambridge Dictionaries Online where advertisements surround the definition 
on all sides. The noisier site may bring in more revenue, but the quieter site may be less 
distracting for the user. An informal test conducted by the blogger Marc Wandschneider 
(2010) seems to support this view. Wandschneider compared his own experiences of 
looking up the same words in four bilingual Chinese-English / English-Chinese dic- 
tionaries over a period of two or three weeks, and although he does not explicitly com- 
ment on the presence of advertising material, he states a preference for interfaces that 
are ‘clean’ and ‘spartan, with ‘lots of blank space. Dziemianko (2011) suggests that stu- 
dents might learn more from consulting an online dictionary that is advertisement-free. 
In studies comparing users’ retention of meaning and collocations, she found that those 
who referred to the online Collins Cobuild Advanced Dictionary (which did not have 
advertisements) had significantly better scores than those who used e-versions of the 
Oxford Advanced Learner’s Dictionary and the Longman Dictionary of Contemporary 
English which did. 
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Lannoy (2010: 174) proposes that publishers should establish business partnerships 
with companies selling complementary products or services on the web, Companies 
may be willing to pay e-dictionary publishers to perform an intermediary role, lead- 
ing users from initial dictionary consultations to related products such as course books, 
novels, reference materials, news media, or learning materials. 

Some sites, such as Wordnik.com, contain no overt advertising links but encourage 
users to sign up as members of an online community where they can tag words, cre- 
ate lists, and post comments. This kind of social networking activity has potential as an 
indicator of purchasing habits, and could lead to users being targeted as consumers at 
some later stage. Although members of such e-dictionary user groups are unlikely to 
view their personal data as having any commercial worth, according to the information 
on the Wordnik site the President of Wordnik takes a special entrepreneurial interest 
in ‘opportunities in next-generation social commerce, community, crowdsourcing, and 
social media’ In these early days of social networking, it is impossible to gauge how such 
data might eventually affect e-dictionary use, design, and finances. 

In the future it is conceivable that sensor technology could provide an even more 
sophisticated way of interacting with e-dictionary users. Films and games can now be 
customized to individual requirements by monitoring heart rate and sweat levels (see 
<www.shimmer-research.com>). The data are sent from a skin response sensor worn 
by the user to a smartphone app, which then adjusts on-screen content accordingly, in 
real time. In the same way, subtle changes in dictionary users’ physiology might one day 
trigger the provision of simpler or more complex dictionary definitions, more in-depth 
treatment of the look-up item, or links to activities related to the look-up item. 


36.5 E-DICTIONARIES AND WEB 
PRESENCE: THE WAY FORWARD 
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In order to maintain a competitive edge, dictionary publishers now work to maximize 
their web presence. Lannoy (2010) recommends the use of search engine optimization 
strategies to improve traffic to e-dictionaries and increase the speed with which they 
deliver the information users desire. The entry point for many e-dictionary consulta- 
tions is a search engine such as Google, Yahoo, or Bing, and these techniques can dra- 
matically increase the number of general dictionary searches that lead to a specific site. 
Moreover many potential dictionary users have backgrounds in languages other 
than English, and specific local needs. For example Google Insights for Search (<www. 
google.com/insights/search/#>) reveals that in the twelve months leading to June 2012 
the greatest number of Google searches for ‘English dictionary’ came from Pakistan, 
Cambodia, Afghanistan, Bangladesh, Myanmar, Mongolia, India, and Nepal (Google 
does not operate in mainland China). Lannoy (2010: 180) concludes that ‘internet 
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strategy needs to be designed market by market’ and advises publishers to localize, by 
providing bilingual content, and by translating their dictionary interfaces. 

This advice perfectly echoes the thoughts of McArthur (1998b), who foresaw the 
need for localization, bilingualization, nationalization, and regionalization to meet the 
requirements of the next generation of dictionary users in the global e-market. The 
days of most authoritative, monolingual print dictionaries may be numbered, but 
there are exciting opportunities ahead for dynamic, adaptive bi- and multi-lingual 
local e-dictionaries. 


CHAPTER 37 
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37.1 INTRODUCTION 
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LANGUAGE can be studied from linguistic and social points of view. Lexicography has 
generally tended to deal with the practical problems at hand, that is, how to create refer- 
ence works that are relevant (e.g. Béjoint, this volume), focused, and marketable (e.g. 
Grundy and Rawlinson, this volume). These priorities have in turn informed theoretical 
considerations. The types of dictionaries at the forefront of this chapter are dictionar- 
ies of national varieties of a language. National dictionaries are defined as dictionaries 
that are marketed specifically to a country. They can serve as agents, contributors, and 
markers of national identity, and as such function as major tools in language planning 
processes for national languages and national varieties! 

In Einar Haugen’s (1966: 16-26) ground-breaking study on language conflict and 
language planning, a four-tiered approach to language planning is introduced: first, 
the selection of a variety as a norm; second, the codification of that variety’s forms in 
orthography and pronunciation; third, the elaboration of its functions to meet all com- 
municative needs (via innovation or adaptation of lexical items); and fourth and last, the 
socially most important criterion of the degree of acceptance in the speech community. 


1 T would like to thank Elizabeth Hodgson for helpful comments on a draft of this chapter. All faults 
remain entirely my own. 
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Dictionary-making, which is part and parcel of the second and third stages, the linguis- 
tic codification process, has often been tied with the rise of nationalism in one way or 
another. Lexicography as a discipline, however, has traditionally left the connection 
between the codified variety and the expression of nationalism largely unaddressed. 
Major research contributions such as the monumental, now somewhat dated, HSK ref- 
erence handbook on Dictionaries (Hausmann et al. 1989-91) do not explicitly address 
the link between language and nation. 

The connection between the linguistic and the cultural seems to be either too clear to 
be expressly explored or too elusive to be adequately captured by scholars of lexicogra- 
phy. Lexicographers deal primarily with lexicographical problems, and hence national 
dictionaries are usually treated under the general heading of monolingual dictionaries. 
According to one standard textbook, monolingual dictionaries are seen as filling gaps in 
the native speakers’ knowledge of the language (Svensén 2009713). Svensén's text gives 
a good example of the highly general level of observations on language and culture in 
lexicography when he says that: 


It is a commonplace to say that a dictionary is a product of the culture in which it has 
come into being; it is less so to say that it plays an important part in the development 
of that culture. (Svensén 2009: 1) 


Treating a dictionary as a reflection of a particular sociolinguistic situation, as expressed 
above, is a safe assumption: today’s lexicographers take a descriptive approach and 
document the language or variety (see Mugglestone, this volume). But what about the 
language-planning and prescriptive aspects of lexicography? The present article aims 
to address national dictionaries as catalysts and agents in national identity construc- 
tions. As our understanding of the actual cultural work of national dictionaries becomes 
much richer if we examine them in their social and historical contexts, the present chap- 
ter rests on two case studies. Both cases are in the context of pluricentric languages. 
Pluricentric languages (e.g. Clyne 1992) are languages that have more than one centre of 
linguistic autonomy, although not all centres are usually equally powerful. The two cases 
are chosen for their relevance in delineating one variety of a given language in relation to 
other, generally, more powerful varieties of the same language. The processes cannot be 
shown to the same degree with two varieties that are not mutually intelligible, as a lot of 
the identity- forming factors are already a priori present (e.g. Kloss 1967). 


37.2 NATION STATE AND STATE NATION 


An important distinction in the social sciences with regard to language and 
nation is made between state nation (German: Staatsnation) and nation state 
(German: Nationalstaat). The former found its expression historically in feudal states 
that combined speakers of different languages. ‘The latter, the equation of language and 
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nation per se, is a modern development and is usually considered a consequence of the 
French Revolution (e.g. Coulmas 1985: 41; Ammon 1995: 19). The Académie francaise 
was one of the first institutions to codify what was to become a national language and 
showed its piecemeal process. For instance, after one and a half centuries of centralized 
language planning in France, by the time of the French Revolution less than half of all 
French citizens spoke French (Fishman 1972: 78). The time between 1789 and about 1900 
was crucial for the connection of language and nation, so that by the early twentieth 
century philology was seen as a precursor to nationalism: ‘There is not a new nation 
in Europe which has not been preceded by from fifty to eighty years of philology and 
archeological studies’ (Fournol 1931, quoted in Fishman 1972: 45). 

The concept of a nation state in the European context is crucially linked with 
German philosophers, above all with Johann Gottfried Herder (1744-1803) and the 
concept of ethnic nationalism, and with Johann Gottlieb Fichte’s (1762-1814) plans 
for nationalistic education (Wright 2004: 61-2). One of the central ideas around the 
nation state is linguistic, since it equates speakers of one language with one single 
culture, who, it is thought, should reside as one nation in one country. As such, the 
concept of a nation state by definition marginalizes minority groups in the territorial 
state. Nationalist ideology entails the idea that speakers of one language represent one 
nation andare to organize themselves into one state—the nation state (sometimes also 
referred to as Sprachnation ‘nation of one language’ or Kulturnation ‘nation of one 
culture’). 

By contrast, a state nation is the result of consensus among people to form a com- 
mon state within a given geographical boundary, regardless of concepts of language 
or ethnicity. Examples of state nations are Switzerland, where speakers of four official 
languages share one state and nation, Canada, where English and French are official 
languages, and Austria, which shares a common language with Germany and part of 
Switzerland among other nations (see Ammon 1995: 30-4 for an alternative terminol- 
ogy). The effects of the notion of the nation state are still powerful and active today. In 
1993, the state of Czechoslovakia peacefully disintegrated into a Republic of Slovakia 
and a Czech Republic, and linguistic differences were among the cultural differences 
that resulted in the break. In the wake of the founding of new states, two new standard 
languages of what used to be Czechoslovakian have been codified, Czech and Slovak. 

An abstract entity such as the nation is in constant need of being justified with char- 
acteristic ‘distinctive and unique ways of thinking’ (Wright 2004: 35). Linguistic means 
serve as one such medium. A common distinction in language planning in general, 
including nationalist language planning, is between status planning, corpus plan- 
ning, and acquisition planning. Status planning concerns the choice of a variety to be 
codified and, usually, this choice falls on the politically and economically most power- 
ful group and region. Corpus planning then works to differentiate the chosen variety 
from related varieties and may work towards establishing independent linguistic mark- 
ers. Acquisition planning concerns the teaching and spread of the codified national 
language or variety by means of a written standard language, the ‘official’ medium of 
communication. 
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The case studies presented in this chapter are both from state nations that share 
a mutually intelligible language with more powerful adjacent neighbour states. In 
these instances, the nationalist linguistic markers need to be made more visible, 
while the processes of their codification are more fraught than can be seen in the 
codification of two mutually unintelligible varieties. By focusing on the issue of the 
codification of Canadian English (CanE) and Austrian German (AusG), which are 
both language varieties that require considerable corpus and acquisition planning 
(ie. Ausbausprache, in Kloss’s 1967 sense) in a political climate where opposition 
to these goals is at least tenable, useful insights can be gained from a comparison 
of these two rather different instances of the same type of situation. For on the 
one hand, Canada and CanE knows itself in a global context of English as a World 
Language, while on the other hand AusG knows itself to be a pluricentric language of 
lesser international importance. The effects of these differing senses of external refer- 
ence, global status, historical background, societal structure, and nationalist freight 
for the language therefore illuminate how the constitutive mappings of language and 
nation can function. 


37-3 AUSTRIA AND CANADA: POLITICAL 
AND LINGUISTIC INDEPENDENCE 
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At first glance the language-external—that is, the political—histories of Austria 
and Canada are rather different, with the former being an old world country with a 
long settlement history and tight restrictions on immigration and the latter a new 
world country with more generous immigration policies. The sociolinguistic pre- 
conditions for the development of an Austrian Standard, for instance, have existed 
only since 1866, when Austria was no longer part of the German Confederation 
(German Bund, 1815~66) (Ammon 1995: 120). Prior to 1866, Austria was too heav- 
ily involved and connected to the other German states. While Austria was leading 
other German-speaking states politically in the German Bund, the accepted lin- 
guistic norms were taken over from the norm-providing centres in Germany. The 
post-1866 era, when the rejection of Austria in the German Bund resulted in the 
foundation of the Austrian-Hungarian Monarchy, is also the time when the con- 
cept of ‘6sterreichisches (Hoch)deutsch, Austrian formal standard or colloquial 
standard (see Table 37.1), is first mentioned (Wiesinger 1988: 16). Prior to 1900 there 
seems to have been little awareness of an Austrian Standard variety, and features of 
educated speakers that seemed to divert from the norms in Germany were viewed as 
outright wrong (Ammon 1995: 123). 

The disintegration of the Austrian Empire, a state nation, into smaller nation states in 
1918 reduced Austria to the predominantly German-speaking areas under Habsburg rule. 
As such, the first Austrian Republic was conceived by its elites as ‘Deutsch-Osterreich’ 
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Table 37.1 Barbour and Stevenson's approach to language levels in German and 
English 


German terminology Barbour and Stevenson 


English terminology 


Standardsprache ‘standard language’ Standard language 


Einheitssprache ‘unified language’ 


Formal standard 


Schriftsprache ‘written language’ 
Literatursprache ‘literary language’ 
Umgangssprache 


Colloquial standard 
‘Hochdeutsch’ 
Colloquial 

non-standard 


‘colloquial Janguage’ 


(Non-standard) 
Dialect 


Diaiekt 
Mundart 


Traditional dialect 
dialect 


Source: Barbour and Stevenson (1990: 141) 


(German-Austria) with the aim of integration into Germany at a later stage. Both the 
intent and the name (German- Austria) were forbidden by the allied victors of World War 
I, which led to the Republic of Austria, although public support for this state was weak 
and the socio-economic situation combined with other factors to lead to the integration 
of Austria into Nazi Germany in 1938 (Anschluss). Such political realities were hardly 
conducive towards developing Austrian linguistic standards. It was not until 1945 when a 
new Austrian Republic was founded, this time determined to be an autonomous nation, 
that the sociolinguistic conditions of linguistic autonomy were created. 

By contrast, Canada was a colony of Britain prior to 1867. With Confederation 
in that year, a beginning was made towards the establishment of a more independ- 
ent political unit. Scholars usually consider the 1926 Imperial Conference and the 
Balfour Declaration, in which the UK agreed that the British Dominions were 
to be treated as equals, as the starting point of Canadian independence. However, 
Canadian Citizenship (as opposed to British) was not established until 1947, which 
suggests the end of World War II as the start of fully-fledged Canadian independence. 

From this perspective, Canada and Austria reached full independence as state nations 
at about the same time. I will outline in this chapter the developments of the lexico- 
graphical traditions in both countries in relation to expressions of national identity. 
Important language-planning events relating to national monolingual dictionaries will 
be addressed and public discourse, relying on the popular press, will be juxtaposed with 
commentaries and reviews of reference works by scholarly elites, to allow the processes 
of identity formation to be gauged. 
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37.4 THEORIZING THE STANDARD: ENGLISH 
AND GERMAN 
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Concepts of a standard vary both linguistically and socially from language to lan- 
guage. English is today’s global language, and Kachru’s (1985) Circle Model has 
often been used to classify it: an Inner Circle of former settler colonies (the UK, 
Ireland, USA, Canada, Australia, New Zealand), a vast Outer Circle of former colo- 
nial countries where English was taught primarily through the education system 
as a second language and not as a first language (e.g. India, Pakistan, Bangladesh, 
the Philippines), and a growing Expanding Circle of countries where English did 
not have historical ties but is now increasingly used (e.g. Russia, China, countries 
of continental Europe). German has also been conceptualized as a pluricentric lan- 
guage since Clyne (1984). The German language is spoken in eight countries, and is 
the official language in Germany, Austria, Liechtenstein, and in German-speaking 
Switzerland (Wiesinger 2006c: 202). It has official regional status in eastern Belgium, 
Luxembourg, and South Tyrol (Italy); in parts of Alsace and Lorraine (France) it is 
still in public use. 

The difference in geographical spread does not relate to the degree of variation found 
in English and German. A central problem in all linguistic, and especially lexicograph- 
ical, projects pertains to the types of speech, or speech levels (Sprachschichten), to be 
included in a dictionary. Standard and non-standard are, generally speaking, broader 
concepts in English than they are in German. Table 37.1 highlights differences in the 
German and English terminology and present Barbour and Stevenson's terms which are 
adopted in this chapter. 

In Austria, awareness of traditional dialects (Mundart/Dialekt) is high (see 
Wodak et al. 2009; de Cillia 1998). For the purposes of national dictionaries, two 
areas in the right column of Table 37.1 are of the most profound interest: formal 
standard and colloquial standard. These two standards are what one would expect 
to be unmarked in national desk dictionaries of German, which includes occasional 
mention of (marked) important non-standard colloquialisms and perhaps even tra- 
ditional dialect words. The standard is, of course, subject to change, and yesterday's 
non-standard colloquial forms may be tomorrow’s standard. This area is perhaps 
the most profoundly criticized in any dictionary of German as it requires extensive 
study of social, geographical, and attitudinal use of all lexemes and meanings listed. 
Austrian dictionaries are criticized by Austrian elites for including non-stand- 
ard terms as standard terms (e.g. Wiesinger 1980). German dictionaries lay claim 
to represent the entire German-speaking world, while being focused on German 
German. As Ammon (1995: 366) puts it: ‘The Duden dictionaries are, in spite of 
their claim to represent all German-speaking areas, obviously limited to their own 
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national point of view.” Given that such external codification attempts usually come 
from countries whose language variety is the dominant one, the practice poses 
imminent threats to the independence of any non-dominant variety of a language. 

The codification of a standard is complicated not just by social and situational, but 
also by geographical variation. Austria’s geographic territory does not fully overlap with 
any traditional German dialect zone, which is a feature of national varieties that is not 
unusual. In the west of Austria, Allemannic dialects of German are spoken, whereas the 
rest (in fact the vast majority) of the country belongs to Bavarian dialects, which show 
some correspondence with the Bavarian dialects of Germany. In this context, Austria's 
pan-national vocabulary that developed as a result of the Habsburg administration is 
often seen as a unifying factor, but is at times discounted as a marginal factor in the 
light of the traditional cross-border dialect zones. Anecdotal evidence suggests another, 
newer level of reality that the traditional dialect classifications do not seem to cover. 
For instance, it is fairly easy for an Upper Austrian to identify a speaker from bordering 
Bavaria and vice versa, usually within a couple of words or a sentence, despite the two 
regions sharing the same traditional dialect zone, for example es pressiert ‘Tm ina hurry’ 
is unlikely to be used in Austria. It seems perceivable that sociophonetic dialect studies 
would identify isoglosses that align more with national boundaries, as was shown for 
Canada and United States (Labov et al. 2006: 22.4). 

In CanE different features are foregrounded in the linguistic discussions. The linguis- 
tic discourse centres around the homogeneity of CanE (Chambers 2012). In the homo- 
geneity discussion, individual speech enclaves that resist homogenization more strongly 
have been proposed: Newfoundland, the most distinct dialect region, the Ottawa Valley, 
and smaller pockets across the country, mostly in the east. Generally, even these linguis- 
tic pockets are found to be assimilating more and more to the existing Standard CanE. 

Regional variation in CanE exists most obviously on the lexical level but is con- 
sidered as traditional dialect (see the Dictionary of Newfoundland English or the 
Dictionary of Prince Edward Island English) and is thus not part of the Canadian 
Standard. Four major dialect zones of CanE can be identified: Newfoundland, the 
Maritimes (New Brunswick, Nova Scotia and Prince Edward Island), Quebec, and 
Central and Western Canada (subregions exist, see Boberg 2005). From a national per- 
spective, of the about 2,200 Canadianisms (Cdn.) in the latest edition of the Canadian 
Oxford Dictionary (second edition, 2004), 700 are marked as regionalisms (almost 
40 per cent of which from Newfoundland), which implies that c.1500 are of national 
currency (Dollinger and Brinton 2008: 56, fn 13). Standard CanE is thus an important 
Canadian variety. It is defined socially as the dialect of Anglophone Canadians who 
are second generation or later (born in the country) and who are urban dwellers and 
middle class (Chambers 1998: 252). According to one estimate, about 36 per cent of 
the Canadian population are speakers of Standard CanE (Dollinger 2011b), which is 
a much higher percentage than British RP (Trudgill and Hannah 2002) or Standard 
American English (Kretzschmar 2008). 


2 All English translations are my own. 
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Discussions about Standard AusG have, first and foremost, centred on the writ- 
ten standard, which is disputed in linguistic circles (see Wiesinger 1980; Muhr 1983 
for two profound, yet different assessments). Central to the critiques is a demand for 
a sociolinguistic study of the standard, by considering its interface with regional and 
dialectal forms. Some linguists implicitly or expressly reject language planning as out- 
side the purview of linguistics (see Wiesinger 1980: 396; Scheuringer 1996). While the 
ways and means of shaping the standard are subject to critique, national dictionaries 
can only go so far in their description of linguistic facts. At some point, the postula- 
tion of prevalent forms as standard are needed, which borders on prescription. It is here 
where linguists offer rather different assessments, with some not accepting the concept 
of Standard AusG or prescriptive elements (e.g. Scheuringer 1996; Pohl 1997), others 
promoting them (e.g. Muhr 1995; de Cillia 1998), and yet others seeing the existing rec- 
ommendations as based on imprecise or a lack of analysis (e.g. Wiesinger 1980; Muhr 
1983; Schrodt 1997). 

Such discussions on the nature of the standard are central to language planning pro- 
cesses. It seems that in the Austrian case the development ofa common written standard 
for Austria, Germany, and Switzerland prior to World War II is taken as an argument 
against the codification of an Austrian written Standard in the present day. The same 
points of criticism levelled against AusG—-that there is no coherent dialect zone in the 
state's geographical area—were applied to Norwegian as it wrestled itself from Danish 
domination and Swedish influence to a national variety starting in the 1830s. None of the 
linguistic features that were selected for the new Norwegian written standard were uni- 
versally used throughout Norway, ‘but each one by itself represented a widely diffused 
pattern, and together they made a structure which undeniably constituted a different 
language from Danish (Haugen 1966: 36). This seems to be an important point: if one 
were to expect a perfect match of national geography and linguistic boundaries, there 
would be no Norwegian written standard today (when in fact there are two: Bokmal (a 
more conservative standard, closer to Danish) and Nynorsk (based more on Norwegian 
features)). The question is thus a political, not a linguistic one, and lexicographers will 
need to declare their points of view on both issues. 


37.5 AUSTRIAN AND CANADIAN DESK 
DICTIONARIES 


The lexicographical landscape of desk dictionaries in Austria is limited to one 
Austrian-made desk dictionary, the Osterreichisches Wérterbuch (Austrian Dictionary, 
hereafter OWB). The Canadian situation is more complex: the Gage Canadian 
Dictionary (in its latest edition from 1997) is the longest-standing CanE dictionary, 
while the Canadian Oxford Dictionary (2004) is the most widely known, and there are 
less widely used dictionaries in the mix. 
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The OWB was first published in 1951 and is the only official dictionary in Austria for 
use in schools and governmental administration. It was commissioned by the federal 
Department of Education (Unterrichtsministerium) for the use in grades 5-12 of all 
school types and every Austrian school child is given a copy free of charge in grade 5 as 
part of the school book programme, which provides school books free of charge. The 
OWB has been called by linguists and dialectologists ‘the central and most disambigu- 
ous part of the codex of Austrian Standard German’ (Ammon 1995: 138) or an ‘essential 
marker’ of the connection of Austrian national identity and language (Wiesinger 2006c: 
203). The OWB has also seen profound criticism, however, especially since its first revi- 
sion in 1979. After appearing unchanged in 34 reprints (confusingly called ‘editions’) 
from 1951 to 1978, the 35th edition included dialectal terms and colloquial terms that 
were presented as unmarked, and hence implicitly standard, forms. The critique was not 
about the inclusion of such terms, but the inclusion without appropriate labelling. One 
additional problem lies in the uncertain status of some terms. The subsequent 36th edi- 
tion reversed some of the additions while the 41st edition from 2009 has not only further 
grown in size and scope (780 pages and about 70,000 headwords) but continued a more 
cautious approach to augmenting the standard. The current 42nd edition (2012) features 
a drastic enlargement of the lexicon and, with more than 1,000 pages, has reached for 
all intents and purposes the scope of its German counterpart Duden Rechtschreibung 
(about 110,000 headwords; compare Buchmann, this volume). The OWB offers con- 
cise information on orthography (its main function, as is customary for dictionaries of 
German), regional and usage labels, morphology, syllabification, some example phrases, 
and aspects of pronunciation, besides very brief definitions of lexemes. 

A characteristic of the OWB has been until recently the ‘starring’ of lexemes with an 
*, The asterisked marking was meant to dissuade users from using ‘un- Austrian’ words 
and it was this point that triggered the harshest criticisms of the 35th and subsequent edi- 
tions. In most cases these were German German words that speakers were discouraged 
to use (see Ammon 1995: 181-96). The 41st edition abandoned asterisked words alto- 
gether, using instead national and regional labels: D for German, CH for Switzerland, 
and SidT for Southern Tyrol (OWB-41:11). Some words asterisked in earlier editions are 
now marked with regional labels, for example Abitur (D), Aprikose (D, CH), Jura (D), 
Quark! (D), Sahne (especially D). The user's guide simply states that the Austrian ‘stand- 
ard is not marked’ and that two other levels, umgangssprachlich (colloquial, i.e. both 
colloquial standard and colloquial non-standard) and mundartlich (traditional dialec- 
tal) are used. For the important regional dimension, a cluster of regional labels within 
Austria is introduced. In hindsight, it seems that expert criticism has led to important 
adaptations in regional and national labels. 

The Gage Canadian Dictionary stands in a long tradition of Canadian dictionary pub- 
lishing, which commenced in 1954 at the founding meeting of the Canadian Linguistic 
Association. It is comparable with the OWB in terms of tradition, but not in its con- 
nection to academic scholarship. The beginnings of the OWB were not connected to 
academic research, while the Gage Canadian was compiled by some of the most expert 
academic dialectologists of their time. Gregg (1993) offers an overview of the history 
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of lexicography in Canada, while Considine’s (2003) concise yet comprehensive over- 
view is an indispensable spring board for research on Canadian English lexicography 
that is still very much up-to-date. Both OWB and Gage are expressly intended for use in 
schools (to high school or Matura levels) and as such serve important acquisition planning 
functions. The Gage has been the leader in Canadian high schools for many years now. One 
difference between the Gage and the OWB is that the latter is supported by the Austrian 
Ministry of Education, whereas the Gage dictionaries are the product of a publishing house 
and need to compete with other dictionaries on the national market. An attempt by the 
German Duden Verlag, publisher of the Duden dictionaries (which can be compared in 
prestige to the Oxford dictionaries in the UK), to get an Austrian student edition approved 
was declined by the Austrian ministry (Ammon 1995: 128). 

While AusG dialects have been attributed identity marking functions in the linguistic 
literature (e.g. Wodak et al. 2009), CanE, because of its homogenization phenomena, is of 
more limited use, yet identity markings are also linguistically performed in Anglophone 
Canada. However, the topic of language in Canada is primarily occupied with the relation- 
ship between French and English, and CanE as such is just one of many traits that may play 
arole in national identity formations for Anglophones, with other characteristics (e.g. uni- 
versal health care, gun legislation, perceptions of international cooperation, peace keeping, 
and so forth) often weighing more heavily for the individual. Given the lack of awareness 
of features of CanE in the general populace, which is perhaps best evidenced by repetitious 
newspaper reporting on the ‘discovery’ of CanE since the 1950s (Dollinger 2011¢), the dif- 
ferent uses in AusG and CanE as markers of identity may be a crucial distinction, which is 
reflected in the scholarly as well as the public discourse, to which we turn next. 


37.6 RECEPTION OF NATIONAL 
DICTIONARIES IN CANADA AND AUSTRIA 


TAOPAOTAG TA OHI Ord eEt OER Er SET SE TOT TEUL UBT OOTEUTUDESOEOELSSEUREDDER DEA DUODONS SEDALIA SO DOU LG RUSSEL GH AG GEE GHEE EET EES NOSE ERE SOESEREEDESP SEMIS EMEA DEST EEE SOO LE PETES SO OEE DESO SSeREeOEES 


Canada’s Centennial Year, 1967, saw the completion of the Dictionary of Canadian 
English series of Gage Ltd. The series comprised a flagship historical dictionary, the 
Dictionary of Canadianisms on Historical Principles (DCHP-1, see also DCHP-1 Online) 
and three graded dictionaries for school use. The Senior Dictionary, intended for high 
schools to grade 12 and the equivalent of the OWB, was also first published that year. It 
was the first edition of what later became the Gage Canadian Dictionary. The Gage series 
was a very important step in Canadian lexicography. It marked for the first time locally 
adapted dictionaries made in Canada by Canadians for Canadians and aimed at replac- 
ing the British and American dictionaries that were the only options up to that time 
(Gregg 1993: 35). The series has been updated multiple times; for the Gage Canadian the 
most important, because most substantial, revisions are those of 1983 and 1997. 
Reception of these Canadian works seems to have been positive from the press and 
knowledge elites alike since their inception. Starting in the late 1960s, Canadian newspapers 
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adopted the Gage Senior Dictionary as their in-house reference manual. As one editor 
said, ‘We, at The [Toronto] Telegram, are much impressed by this work,’ another labelled 
the series as ‘[p]erhaps the most significant contribution to Canadian letters in the past 
300 years’ (Dresser 1968), and of the historical dictionary another writer remarked that it 
‘bears lively and scholarly witness to that most elusive of Centennial projects, the Canadian 
identity’ (Moore 1967). 

The reception of CanE dictionaries has been favourable from the outset, although 
lacking in detail. The late 1990s witnessed intense competition in the area of national 
desk dictionaries and a phase of sustained attention in public discourse, as the years 
1997 and 1998 saw the unparalleled phenomenon of three major national dictionaries 
appearing in new editions. First, the Gage Canadian Dictionary appeared in its latest 
revision, followed by the brand-new ITP Nelson Canadian Dictionary of the English 
Language. (Although they were at first competitors, Nelson (at one point part of the 
Thomson empire, hence ITP) would acquire Gage around 2003, including all of its 
dictionaries.) 

The ITP Nelson was geared towards the general user market, unlike the Gage 
Canadian, which, as a school dictionary, excluded taboo words. The ITP Nelson offered 
a product to the general user dictionary market that anticipated the publication of the 
Canadian Oxford Dictionary in 1998. Specialists reviewed both the new Gage Canadian 
and the ITP Nelson favourably at first. Chambers’s (1997) review of the Gage Canadian 
and ITP Nelson puts the two dictionaries to the test, finds both worthy and an improve- 
ment, and wonders what, about a year later, the Canadian Oxford Dictionary would 
‘possibly offer that these two have overlooked 

After the Canadian Oxford Dictionary appeared in 1998, there was a significant shift 
in the focus of press attention. The Canadian Oxford Dictionary used as a starting 
point the British Concise Oxford Dictionary (eighth edition), considering all of its lem- 
mas afresh from a Canadian perspective as well as adding new material, drawing on 
reading programmes and on corpora of Canadian English. The Gage dictionary series 
similarly took ‘as a starting point the series of Thorndike-Barnhard American dic- 
tionaries’ (Gregg 1993: 35), and the ITP Nelson dictionary was based on the American 
Heritage High School Dictionary (third edition). Landau (2001: 43) calls the history of 
English lexicography a ‘recital of successive and often successful acts of piracy’ and the 
copying of lemmas and parts of entries is established practice. In another application 
of this principle, almost no dictionary is ever started from scratch. This chapter will 
not attempt to evaluate in detail how successfully each of this group of three dictionar- 
ies created a new Canadian dictionary from its US or British base, but it is interesting 
to note how reviewers, both in the general press and in academic publications, seem 
to have paid much closer attention to the US bases of the Gage Canadian and the ITP 
Nelson than to the British base of the Canadian Oxford Dictionary. As one reviewer 
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puts it “The American bases [of Gage Canadian and ITP Nelson Canadian] exert an 
ineradicable influence on these Canadian versions. For the 1997 Gage Canadian, 
Lahey (1996) highlights the ‘originally... American database (with significant input 
from Gage's 1967 authority, the Dictionary of Canadianisms on Historical Principles)’ 
without evaluating in any detail the extent to which it departed from this base in any 
of its three major revisions (1967, 1983, 1997). It is tempting to see these comments 
as reflections of wider linguistic and cultural factors: a US base is seen as inherently 
suspect, at least in part as a result of a general anxiety about US linguistic and cultural 
dominance. 

Switching to AusG, press coverage could hardly be more different. Contrary to CanE, 
public discourse on AusG is, when present, not entirely positive, and it is not taken as a 
given that the existence of dictionaries of AusG is necessarily appreciated by everyone. 
For instance, the OWB’s media coverage is at best split. 

The roots of this scepticism are old. Coverage in its pre-publication stage was 
utterly negative (Krassnig 1958: 156). According to Krassnig (1958: 156), who carried 
out the bulk of the work (Wiesinger 2006b: 179), the OWB was labelled a ‘superflu- 
ous, sign-of-the times, offensive’ project that was geared against ‘Gesamtdeutschtum, 
which is best translated as ‘common German-ness. It seems clear that the spectrum of 
political opinions in the public discourse is wider in Austria than in Canada, which 
is reflected in such attitudes. In the context of the immediate post-war period and 
Austria's political fight for independence from Germany, not all OWB commentators 
endorsed the linguistic means for asserting ‘Austrian-ness. Wiesinger (2006a) illus- 
trates the political efforts at the time to foster and nurture an Austrian national iden- 
tity as, in the words of Austria's first chancellor after World War II, something other 
than ‘a second German state’ (2006a: 404). The OWB was one tool to stress Austria’s 
autonomy. 

Fifty years later, linguistic research suggests that AusG is indeed important as an 
identity marker in opposition to Germany (de Cillia 1998: 101), which offers a ration- 
ale for the original marking of non- Austrian words in the OWB, most of which are 
German German lexical items. DeCillia summarizes discourse analytical findings that 
German-speaking Austrians have an ‘ambivalent relationship to AusG: 


On the one hand, the importance of language for Austrian-ness is stressed. ... On the 
other hand, there exists little awareness of a standard variety of AusG of the pluricen- 
tric language German. (de Cillia 1998: 101-2) 


For CanE, no such statements seem to exist. When asked of language attitudes 
towards CanE, however, two-thirds of Canadians in Vancouver claim that they can tell 
Americans from Canadians by their English (” = 421) and about 80 per cent believe in 
a ‘Canadian way of speaking, which applies more to university-educated respondents. 
Such data gauge national identity constructions and reflect a desired answer more than 
the actual ability of detecting and identifying linguistic features. 
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37.7 THE ROLES OF AUSTRIAN AND 
CANADIAN DICTIONARIES IN IDENTITY 
FORMATION 


PP PEPeT errr TT Tenerrrrarerer verre rererervrnrrrtrerrerrerrrrrrerrrirtritritrett etreriirtrerriritr rr 


It is often said that the formation ofa national variety requires the existence of linguis- 
tic features of nation-wide or near nation-wide currency. This does not to mean that 
regionalisms are void as markers of national identity, but at least some pan-national lin- 
guistic items would need to be in existence or would need to be developed. Here, both 
AusG (e.g. Wiesinger 2006c: 206) and CanE (e.g. Gérlach 1990: 1483) are labelled as 
having relatively few lexemes that are national and standard. Wiesinger (2006c) lists 3 
per cent, that is, 3 in 100 standard words as Austrian. The number (types or lemmas) of 
national variants is one factor, while their use (token frequency of each type) is another 
(Muhr 1995: 94; Schrodt 1997: 19). However, the relative frequency and, above all, the 
symbolic importance that is given to national variants, however viewed, are relevant. In 
Canada, for instance, toque ‘hat’ and parkade ‘parking structure’ are two types, but their 
token frequencies are relatively high. That is, while they are only two words (types), both 
of them are used fairly frequently and thus their frequencies (token counts) are much 
higher and more important for CanE than the observation that they are ‘just’ two words 
out of the total number of, say, 40,000 words in an educated speaker's lexicon, or five 
thousandths of a percent of such a lexicon. In Austria, likewise, Janner instead of Januar 
‘January’ is a high frequency item once a year for an entire month. These words act as 
identity markers to a greater degree than such type-based counts suggest. 

While the homogeneity of CanE has left its imprint on the lexicon, the Canadian 
Oxford features some 2,200 terms marked as Canadian, including regional terms, abbre- 
viations, proper names, military ranks, national institutions, and slang terms, The 2,200 
words in the dictionary that are marked as Canadian comprise a mere 1.7 per cent of that 
lexicon (c.130,000 words). Many of these terms are not discussed in public or scholarly 
discourse, for example abbreviations of military ranks which figure prominently, which 
renders them ineffective and probably too specialized to take on identity-forming 
functions outside of their specific domains, while high token frequency words, such as 
parkade or toque, seem to be underestimated in their importance. In AusG, the problem 
is the opposite of Canadian homogeneity and the question is which dialectal, colloquial, 
and regional variants should be codified into the standard, which requires the accept- 
ance of corpus planning in the lexicographical process. 

Both varieties, CanE and AusG, have had only comparatively short language-external 
constellations, that is, socio-political conditions, to allow for the codification of stand- 
ard varieties. The OWB was geared towards forming a standard by selecting forms and 
codifying them, but not all regional dialectologists necessarily agree with the concept 
of Austria as an autonomous linguistic centre of its own. Some point to weaknesses in 
selection and marking of a Standard AusG (e.g. Wiesinger 1980; Muhr 1983; Ammon 
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1995), others seem to be sceptical of the corpus planning processes for AusG in general 
(Scheuringer 1996; Pohl 1997). While ‘Austrian German’ plays a significant role in iden- 
tity constructions for Austrians, this awareness is largely limited to traditional dialects 
and there is generally little awareness of features of an Austrian standard. Here, corpus 
planning and dictionary codification would be the best answer. After sixty years of the 
OWB, the results are split and the codified standard is not universally accepted. A search 
of the online archive of the national Austrian paper of record Der Standard (15 February 
2012) displays the dilemma: from 2002 to 2012 (15 February), the OWB is mentioned a 
mere fourteen times, while the German Duden dictionary is referred to 317 times, con- 
firming impressions of the greater importance of German Duden dictionaries outside of 
Austrian schools (Wiesinger, in Ammon 1995: 139). 

Gérlach (1990: 1484) referred to the scholarly quality of Canadian dictionaries as 
‘small, which is perhaps somewhat exaggerated but the best reason for such an assess- 
ment lies in the absence of an academic review culture for English lexicography in 
Canada. Critique is certainly plentiful in Austria, where a dispute is in full force that has 
not yet happened in Canada: the critique not only of the purpose of corpus planning in 
dictionaries, but, more generally, of the precise nature of the standard. Such scholarly 
discourse would be a welcome and necessary addition to the discussion of Canadian 
English (e.g. Dollinger 2011b). What both case studies have shown, however, is that 
national dictionaries are at the forefront of corpus planning decisions for standard vari- 
eties and standard languages that trigger wider societal responses that need to be man- 
aged, In any case, the lexicographer’s best answer is the execution of choices that are 
informed by both profound language study and the declaration of the societal purpose 
of a given dictionary. 


A CHRONOLOGY OF Major EVENTS IN THE 
HISTORY OF LEXICOGRAPHY 


JOHN CONSIDINE 


This chronology presents a selection of highlights in the world history of the making of lexi- 
cal dictionaries. Beyond giving a sense of when some of the oldest lexicographical traditions 
began, it pays particular attention to the western European languages and, within them, to 
English. 


¢.3200 BC Earliest monolingual Sumerian wordlists in cuneiform writing on clay tablets from 
level IVa of the city of Uruk. These tablets, of which there are about 670, were used in the 
teaching of the writing system. 

¢.2400 BC Earliest bilingual wordlists of Sumerian and the Semitic language Eblaite on 
clay tablets in Sumerian cuneiform from the archives of the ancient city of Ebla (at Tell 
Mardikh in modern Syria). 

18th cent. Bc? Compilation of the Sumerian-Akkadian lexical tablets known as HAR-ra 
= hubullu or Urra-hubullu (this is the first entry, for a word meaning ‘interest-bearing 
debt’), comprising more than 9,700 entries in cuneiform writing. 

i8th cent.8c Compilation of the thematically ordered Egyptian list of nouns written in hiero- 
glyphs on papyrus known as the Ramesseum Onomasticon. Subsequent lists of the same 
sort would be compiled for two millennia: the Tebtunis Onomasticon of the ist-2nd cen- 
tury AD, a papyrus in hieratic script originally more than ten metres long, exceptionally 
includes a section of verbs as well as one of nouns. 

4.300 BC Compilation of the Nighantu, a list of Sanskrit words from Vedic texts, which appears 
to be the earliest extant lexicographical text from the Sanskrit tradition; a commentary on 
it, the Nirukta, appears to be no later than the third century Bc, and may be earlier. 

¢.300 BC Philitas of Cos and Simias (or Simmias) of Rhodes make the first extensive learned 
collections of glosses of ancient Greek epic and dialect words, initiating the Greek lexico- 
graphical tradition. Their work only survives in fragments. 

3rd cent. Bc? Compilation of the Erya (‘The Ready Guide’), a thematically-arranged compen- 
dium of glosses, covering 4,300 characters, which is the earliest extant Chinese wordlist 
and had a long tradition of successors. 

istcent.BC Beginnings of ancient Latin lexicography, in works such as the lost Liber glossema- 
torum of Lucius Ateius Philologus. 

a.i8 AD Compilation of the Fangyan (‘Dictionary of Dialectal Words’) attributed to Yang 
Xiong, which glosses regional varieties of Chinese and words from a few other languages, 
arguably the earliest dialect dictionary in any tradition. 
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4.20 AD? Marcus Verrius Flaccus compiles De verborum significatu, the most important work 
of ancient Latin lexicography, now known from the second-century abridgement by 
Sextus Pompeius Festus (itself known from fragments and a yet later epitome, made by 
Paulus Diaconus in the eighth century). 

a.149 Xu Shen compiles the Shuowen jiezi (“Explanatory Dictionary of Chinese Characters’), 
registering 9,353 characters classified by shared graphic elements, the source of a tradition 
extending as far as the compilation before 1615 of the Zihui (‘Comprehensive Dictionary 
of Chinese Characters’) of Mei Yingzuo, which registers 33,179 characters. 

2nd cent.? About 120 words of Tamil are explained in the metrical grammatical text 
Tolkappiyam, which may have been compiled gradually over the period 200 Bc to 200 AD. 

sth-6th cent. Hesychius of Alexandria compiles an alphabetic lexicon of obscure ancient 
Greek words, with about 51,100 entries, the oldest Greek dictionary to survive in anything 
like its original form. 

6th cent.? Amarasimha compiles the thematically arranged metrical dictionary known as 
Amarakosa, the most famous of the early lexica of Sanskrit. More than eighty commentar- 
ies would be written on it, the earliest around 1000; it would be translated into Burmese, 
Nevari, Tibetan, and Mongolian; and it would be mentioned as a forerunner by P. M. 
Roget in the introduction to his Thesaurus of 1852. 

a.601 Lu Fayan and others compile the Qieyun (‘Dictionary of Chinese Rhymes’), now known 
only from later recensions such as the Jiyun, which was edited by a team of lexicographers 
between 1037 and 1067, and registered 53,525 characters. 

a.636 Isidore of Seville compiles the Etymologiae, a thematically arranged compendium of 
learning in which etymological and encyclopaedic information are intertwined, of which 
nearly a thousand manuscript copies survive. 

c.670-690 Compilation of the archetype of the Epinal-Erfurt Glossary, in which some Latin 
lemmata are glossed in Latin and some in Old English; this is the oldest document in the 
lexicography of English. 

8th cent. Compilation of the first Irish-Latin glossaries, the most important early mem- 
bers of the tradition being Sanas Cormaic ‘Cormac’s Glossary’ ascribed to Cormac mac 
Cuillendin, king of Munster and bishop (1301 entries in its longest form); O’Mulconry’s 
Glossary (874 entries), and Duil Dromma Cetta, “The Collection of Druim Cett’ (643 
entries). 

8th-ioth cent.? Compilation of the thematically arranged Tamil metrical dictionary 
Tivakaram, which registers about 9,500 entries. It is extant in about 100 manuscripts, and 
in 32 printed editions from the period 1819-1958. 

late 8th cent. Compilation of the Liber glossarum or Glossarium Ansileubi, a compendium of at 
least 30,000 Latin encyclopaedic notes and short glosses from Isidore and other sources, 
at an unknown Carolingian centre of learning. 

a.791 Al-Khalil ibn-Ahmad compiles Kitab al- ‘Ain, the first Arabic dictionary. 

a.814 Compilation, for the translators of Buddhist texts, of the Sanskrit-Tibetan sGra sbyor 
bam po gnyis pa; this, or the roughly contemporaneous Mahdvyutpatti, is the first extant 
Tibetan dictionary. 

gth cent. Compilation of the Etymologicum genuinum, the most important in a tradition of 
Byzantine lexical dictionaries with etymological material. 

gth cent. Beginnings of Hebrew lexicography, known from fragmentary Hebrew-Arabic and 
Hebrew-Persian wordlists, and from reports of the lost Arukh of Zemah ben Paltoi. The 
Agron of Saadya Gaon, a wordlist of Biblical Hebrew for the use of poets, traditionally 
dated to 902, is sometimes called the first Hebrew dictionary. 
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c.g00 Shoji compiles the Shinsen jikyo (“Mirror of Characters Newly Selected’), a Chinese dic- 
tionary registering 21,300 characters, of which some 3,000 have Japanese equivalents; this 
is the first dictionary to register a significant number of Japanese words. 

1oth cent. Menahem ben Saruk compiles the first systematic dictionary of Biblical Hebrew, the 
Mahberet, registering about 2,500 roots and 8,000 derived forms. 

¢.1000? Compilation of the Byzantine work known as the Suda or Souda (and formerly as the 
lexicon of Suidas), comprising some 30,000 lexical and encyclopaedic entries bearing on 
ancient Greek language and culture. 

4.1053 Acompiler known as Papias, perhaps working in Lombardy, completes the Elementarium 
doctrinae rudimentum, an alphabetically arranged encyclopaedic dictionary, of which at 
least ninety manuscripts survive. Incunabular editions were printed in Milan and Venice. 

¢.1065 Abu Mansar ‘Ali b. Ahmad Asadi Tiusi compiles the Lughat-i furs, the first wordlist of 
Persian, registering 1,099 words in the primary manuscript (subsequent manuscripts add 
1,192 more). 

¢.1075 Mahmid Kasgari compiles the Diwan Lugat at-Turk, a compendium of lexical and cul- 
tural material which is the first major monument of Turkic lexicography. 

¢.1148 Osbern Pinnock compiles the Panormia or Liber derivationum, a semi-etymological 
Latin dictionary in which each entry accumulates words derived from the headword, and 
presents illustrative quotations. Some thirty manuscripts survive. 

¢.12th cent. Moggallana compiles the Abhidhanappadipika, the earliest extant lexicon of Pali, 
drawing on the Amarakosa of Amarasimha. 

12th cent. Sun Mu compiles the first Chinese-Korean glossary, registering about 350 words 
written in Chinese characters. 

¢.1145 Nagavarma II compiles the Abhidhanavastukésa, a Sanskrit-Kannada dictionary, one of 
the first lexica to register a Dravidian language other than Tamil. 

c.1200 Hugutio, bishop of Ferrara, compiles his Liber derivationum, a weakly alphabetized 
dictionary drawing on Papias and Osbern, of which over 200 manuscripts survive, some 
with alphabetical indexes. 

c.1218 The word dictionarius is coined by the English-born Parisian teacher John of Garland as 
a title for an elementary Latin textbook. 

1286 Giovanni Balbi of Genoa completes the Catholicon, a major alphabetized Latin diction- 
ary which revises the work of Hugutio. This was the first Latin dictionary, and indeed 
one of the first European books, to be printed: the first incunabular edition bears the date 
1460, though some copies with this date are in fact slightly later reprints. 

late 14th cent. The word dictionarium is used as the title of Pierre Bersuire’s alphabetically 
ordered encyclopedic guide to the interpretation of words in the Vulgate. 

¢.1440 The Promptorium parvulorum, compiled by an anonymous Dominican friar, registers 
about 12,000 English words with Latin equivalents; this is the first substantial dictionary 
with English lemmata. 

1492 Antonio de Nebrija, Lexicon hoc est dictionarium ex sermone latino in hispaniense, a 
Latin-Spanish dictionary of between 28,000 and 30,000 entries (including multiple 
entries for different senses of a given word), followed by his Spanish-Latin Dictionarium 
ex hispaniensi in latinum sermonem in 1495, initiates the post-medieval lexicography of 
Spanish. It would be much used by bilingual lexicographers of Spanish with European 
and American languages. 

1502 First edition of the Latin Dictionarium of Ambrogio Calepino. 210 further editions would 
appear, the last in 1779; many of these would be polyglot expansions of the original work, 
including (in 1595) the first printed dictionary of Japanese and a European language. 
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1721 Nathan Bailey, Universal Etymological English Dictionary (further editions to 1802), fol- 
lowed by a so-called second volume in 1727 (actually a separate dictionary; further edi- 
tions to 1776), a folio Dictionarium Britannicum in 1730, and a revised work issued as A 
New Universal Etymological English Dictionary in 1755, ed. Joseph Nicol Scott, and regis- 
tering about 65,000 words. 

1726-39 First edition of the dictionary of the Spanish Academy, Diccionario de la lengua castel- 
lana, published in six volumes. A one-volume redaction without illustrative quotations 
was published 1780, with a 22nd edition 2001. 

1747 Samuel Johnson, Plan of a Dictionary of the English Language published early August. 

1755 Samuel Johnson, Dictionary of the English Language published 15 April in two folio vol- 
umes, registering some 43,000 lemmas, and providing a dictionary of English as elab- 
orate, and as richly supported with quotations, as those of the Accademia della Crusca 
and the Real Academia Espafiola. Abridged editions appeared 1756 onwards, as did a 
succession of editions of the full dictionary, the fourth (of 1773) being heavily revised by 
Johnson; a four-volume revision by H. J. Todd appeared in 1818 and a new revision by 
R. G. Latham in 1866-1870. 

1774 First volume (A-E) of Johann Christoph Adelung, Versuch eines vollstandigen 
grammatisch-kritischen Wérterbuches der Hochdeutschen Mundart, the first substantial 
monolingual dictionary of German; the fifth and last volume (W-Z) was published 1786, 
and a new edition, Grammatisch-kritisches Worterbuch der Hochdeutschen Mundart, 
1793-1801. 

1793 First volume (A-E) of the first dictionary of the Danish academy, Dansk Ordbog udgiven 
under Videnskabernes Selskabs Bestyrelse; the 8th and last volume (V-Z) finally appeared 
in 1905. 

1806 Noah Webster’s first dictionary, A Compendious Dictionary of the English Language, reg- 
isters about 40,000 lemmas and promises a greater work to come; this would be pub- 
lished in 1828. 

1807 First volume (A-F) of the first monolingual dictionary of Polish, Samuel Gottlieb Linde’s 
Stownik jezyka polskiego, running to some 60,000 entries illustrated with several hun- 
dred thousand citations, published in December; the 6th and last (U-Z) would appear in 
February 1815 (dated 1814 on the title page). 

1808 John Jamieson, An Etymological Dictionary of the Scottish Language (2 vols, with a 
2-volume supplement in 1825) makes pioneering use of accurately referenced and chrono- 
logically ordered illustrative quotations. A revised edition, ed. John Longmuir and David 
Donaldson, was published 1879-87. 

1812 Franz Passow, Uber Zweck, Anlage, und Ergdnzung griechischer Worterbticher makes the 
first explicit statement of the historical principles, already latent in Jamieson’s Scots dic- 
tionary of 1808, on which much of the scholarly lexicography of the nineteenth and subse- 
quent centuries would be founded. , 

1818 First instalment of Charles Richardson's dictionary of English published 14 February, 
as part of the first fascicle of the Encyclopaedia Metropolitana, under the general edi- 
torship of Samuel Taylor Coleridge; the 58th and last fascicle of the main body of the 
Encyclopaedia was published in 1844, and the Encyclopaedia was then reissued in 25 
bound volumes in 1845, the dictionary having been separately issued (with revision of the 
material already published in fascicles) as A New Dictionary of the English Language in 
1836-37, with subsequent editions 1855 and 1875. Its use of chronologically ordered arrays 
of quotations anticipated that of the Oxford English Dictionary. 
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1819 John Pickering, A Vocabulary, or, Collection of Words and Phrases which have been sup- 
posed to be peculiar to the United States (first handwritten draft written 22 February 1810-1 
January 1813) provides the first free-standing printed wordlist of a variety of English from 
beyond the British Isles. 

1828 Noah Webster, American Dictionary of the English Language published November (the 
manuscript having been completed January 1825) in two large quarto volumes, register- 
ing some 70,000 lemmas, and showing the influence of Johnson's Dictionary as revised by 
Todd in 1818. It would be followed by an abridgement, ed. Joseph Emerson Worcester, in 
1829 (with numerous subsequent editions); by a second edition, ed. Webster, dated 1840 
but published in 1841; by a new revised edition, ed. Chauncey A. Goodrich and published 
by the Merriam-Webster Company, in 1847; and by a revision for the British market with 
over 2,000 pictorial illustrations, The Imperial Dictionary of the English Language, ed. 
John Ogilvie (1847-50, supplement 1855), itself further revised and enlarged in 1882 (four 
volumes), ed. Charles Annandale. 

1830 Publication of Worcester’s Comprehensive Pronouncing and Explanatory Dictionary of the 
English Language initiates a vigorous rivalry between editions of this dictionary (notably 
Worcester’s Universal and Critical English Dictionary of 1846 and its successors up to A 
Dictionary of the English Language of 1860) and of Webster's, lasting until 1864. 

1834 First volume (A-C) of Wilhelm Freund, Worterbuch der lateinischen Sprache nach 
historisch-genetische Prinzipien published; the fourth and final volume would be pub- 
lished in 1845, and the whole would be translated into English by E. A. Andrews and 
revised by Charlton T. Lewis and Charles Short in 1879 (Short revised A, Lewis B~Z), with 
an abridgement in 1890, becoming the principal English-Latin dictionary for nearly a 
century. 

1838 Jacob and Wilhelm Grimm undertake a German dictionary project, which would result 
in the first major dictionary of a vernacular language to be compiled on historical princi- 
ples, and the first to be informed by the new comparative philology of the nineteenth cen- 
tury. The undertaking was proposed 3 March, and announced in the Leipziger allgemeine 
Zeitung 29 August. Publication would begin in 1852. 

1841 Maximilien Paul Emile Littré undertakes a French dictionary project with the working 
title Nouveau dictionnaire étymologique de la langue francaise, which would result in the 
first complete historically-oriented dictionary of modern French; editing began in ear- 
nest in 1847. 

1843 H. G., Liddell and R. Scott, A Greek-English Lexicon, based on Franz Passow’s revision 
(1819) of J. G, Schneider’s Kritisches griechisch-deutsches Handworterbuch beym Lesen der 
griechischen profanen Scribenten zu gebrauchen (1797-98), brings the historical principles 
set out by Passow in 1812 into English lexicographical practice. 

1852 First fascicle (A-Allverein, ed. Jacob Grimm) of the Grimms Deutsches Worterbuch 
published 1 May. A-C and E-Frucht would be edited by Jacob Grimm before his death 
in 1863, and D by Wilhelm between 1855 and his death in 1859. Publication of the 380th 
and last fascicle, widrig- Wiking, in January 1961 (dated 1960) completed the alphabetical 
sequence of the dictionary; a volume of sources followed in 1966-71, and revised volumes 
from 1983 onwards. Editorial responsibility was divided between many scholars: work on 
the letter G, for instance, was initiated by Rudolf Hildebrand in a fascicle published in 
1872 and completed by an editorial team with a fascicle published in 1958. The dictionary 
as completed in 1961 ran to 33,872 pages (equivalent to 22,421 OED-sized pages) in sixteen 
two-part volumes, and registered about 250,000 main and 70,000 subsidiary entries. 
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1852 Peter Mark Roget, Thesaurus of English Words and Phrases published 11 June (printing, in 
arun of 1,000, began in March). 

1857 Richard Chenevix Trench’s two-part presentation ‘On Some Deficiencies in our English 
Dictionaries’ to the Philological Society of London, 5 and 19 November, followed 7 January 
1858 by the presentation to the society of Trench’s scheme for a new English dictionary, ini- 
tiates the New English Dictionary / Oxford English Dictionary (NED / OED) project. 

1859 Printing of Littré, Dictionnaire de la langue francaise begins 27 September. A total 
of 30 fascicles were published, the first (a-air) no later than February 1862 and the last 
(vindicativement-end) 25 September 1872; the dictionary was also issued in two volumes, 
each of two parts, dated 1863 (A-C, D-H) and 1872 (I-P. Q-Z), with a supplement (addi- 
tions and corrections, with a “Dictionnaire étymologique de tous les mots dorigine ori- 
entale’ by Marcel Devic) in 1877. The completed dictionary ran to 4,646 OED-sized pages 
and comprised 78,423 entries, illustrated by 293,009 quotations. 

1863 First fascicle (A-Aanhaling) of the Woordenboek der Nederlansche taal (WNT), ed. 
Matthias de Vries, published; the 686th and last, completing vol. 39 (zuid-zythum), was 
published 16 June 1998, with a single-volume supplement of revised entries in A pub- 
lished in fascicles 1942-56, and three more supplementary volumes, registering c.30,000 
new words, in 2001. The completed dictionary, of 49,255 pages (equivalent to 25,038 
OED-sized pages), comprises 400,000 or more entries under ¢.95,000 main headwords, 
with 1,700,000 illustrative quotations; the chronological and geographical scope of the 
NED were greater, and the NED also presented more quotations, but the WNT is the larg- 
est European dictionary in terms of page count and entry count. 

1863 First volume of Vladimir Ivanovich Dal, Tolovyi slovar zhivogo velikorusskogo yazika, the 
major Russian dictionary of the nineteenth century; the fourth and last would be pub- 
lished in 1866, and the dictionary as a whole would register more than 200,000 words, 
presented in semi-etymological order. 

1864 A revision of Webster's American Dictionary of the English Language, ed, Noah Porter with 
etymologies by C. A. F Mahn, revitalizes the Webster tradition; subsequent editions would 
appear in 1879 (with New Words section and biographical supplement), 1882, and 1884. 

1868 Friedrich Christian August Fick, Wérterbuch der indogermanischen Grundsprache, the 
first scholarly dictionary of a reconstructed language (in this case, Proto-Indo-European); 
a fourth edition was published 1890-1909. 

1884 First fascicle (A-Ant) of the NED published 29 January by the Clarendon Press (the aca- 
demic imprint of Oxford University Press), ed. James Murray on the basis of materials 
gathered under his editorship and those of Herbert Coleridge and F. J. Furnivall. Murray 
would be responsible for the volumes containing A~D, H-K, O-P, and T before his death 
in 1915; Henry Bradley would join him as chief editor of E~G, L-M, S-Sh, St, and (with 
Craigie) W-We before his death in 1923; William Craigie would be chief editor of N, Q, R, 
Si-Sq, U-V, W-We (with Bradley), and Worm- Wy; Charles Onions of Su-Sz, Wh- Worling, 
and X-Z. From the fascicle Deceit-deject (1895), the title Oxford English Dictionary would 
also be used. The completed dictionary, of 15,490 pages, comprised 252,200 entries, with 
1,861,200 illustrative quotations. 

1889 First fascicle (A-Appet) of the Century Dictionary published on or before 17 June, 
ed, William Dwight Whitney, based on Annandale’ revision of Ogilvie’s Imperial 
Dictionary, and thus ultimately a descendant of Webster's American Dictionary of 
1828. The 24th and last fascicle would appear in 1891, with a preface dated 1 October; 
the dictionary was issued in six volumes on or before 7 December, registering about 
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215,000 words; a ten-volume issue would appear in 1896 as The Century Dictionary 
and Cyclopedia, and a twelve-volume revision in 1911, both ed. Benjamin Eli Smith; an 
abridgement, the New Century Dictionary, would appear in 1927, with subsequent edi- 
tions and reworkings. 

1890 First edition of Webster's International Dictionary, ed. Porter et al., a thorough revision of 
the American Dictionary of 1864-84, comprising 175,000 entries. A further edition with a 
supplement of 25,000 entries would appear in 1900, ed. William T. Harris, and a Webster's 
New International Dictionary, with 400,000 entries, in 1909, also ed, Harris. 

1893 First volume of Funk & Wagnall’s Standard Dictionary of the English Language; the second 
appeared in 1894, and the two together included 304,000 entries. A revision would appear 
in 1913 as the New Standard Dictionary of the English Language, offering 450,000 entries. 

1893 First fascicle of the Swedish national dictionary on historical principles, Ordbok éfver 
svenska spraket (better known as Svenska Akademiens Ordbok), published, after a false 
start in 1870. Fascicles 369-373 (Trivsel- Tyna) appeared in July 2009. 

1896 First part (A-Ballot) of Joseph Wright's English Dialect Dictionary published 1 July. 
The 30th and last part (the second half of the Dialect Grammar) would appear in 
September 1905. 

1900 First fascicle of Thesaurus linguae Latinae, the most comprehensive of all Latin diction- 
aries, published. Fascicle xvii of vol. X. 2 (pulso-pyxodes) was published in 2009. 

1908 First volume of Eliezer Ben-Yehuda’s Milon ha-lashon ha-’lvrit ha-yeshanah veda- 
hadashah / Complete Dictionary of Ancient and Modern Hebrew, a major document in the 
revival of the Hebrew language; the 17th and last would appear in 1959. 

1911 First edition of the Concise Oxford Dictionary (COD) published 16 June, ed. H. W. Fowler 
and E.G, Fowler. The twelfth edition, issued as The Concise Oxford English Dictionary, was 
published in 2011; COD has been the basis for dictionaries of several varieties of English. 

1918 First volume of the Danish national dictionary on historical principles, Ordbog over det 
danske Sprog, founded by Verner Dahlerup and ed. Harald Juul-Jensen (to 1949) and 
Jorgen Glahder (after 1949) published; the 28th and last volume in the main alphabeti- 
cal series would appear in 1956, with five supplementary volumes 1992-2005, ed. Anne 
Duekilde and Henrik Andersson. 

1919 William Craigie’s presentation ‘New Dictionary Schemes’ to the Philological Society, 
4 April, initiates the so-called period dictionaries of English, multi-volume histori- 
cal dictionaries intended to treat particular areas more fully than was possible for the 
NED / OED. 

1922 First fascicle of Franzdsisches etymologisches Woérterbuch, general ed. Walther von 
Wartburg, the most extensive etymological treatment of any European language; the 
25th and last volume was completed in 2002 with the publication of the 162nd fascicle 
(completing the revision of the range A), bringing the dictionary to a total of 16,707 
pages, since when, revised articles in B have been made available online. 

1927 First volume of Vergleichendes Worterbuch der indogermanischen Sprachen, ed. Julius 
Pokorny from the materials of Alois Walde; the third and last volume, an index, would 
appear in 1932, followed by Pokorny’s Indogermanisches etymologisches Woérterbuch 
(2 vols., issued in parts 1948-69, fifth edition 2005), and by the Indo-European 
Etymological Dictionary Project initiated at Leiden in 1991. 

1928 128th and last fascicle (Wise-Wyzen) of the NED / OED published 19 April, ed. C. T. 
Onions (Wise-Worling) and William Craigie (Worm-Wyzen). The fascicle XYZ, ed. 
Onions, had appeared 6 October 1921. Completed NED issued in ten volumes. 
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1931 First fascicle (A-Assemble) of A Dictionary of the Older Scottish Tongue (DOST), ed. 
William Craigie. DOST would be completed in 12 volumes in 2002 (1-2 ed. Craigie, 3 by 
Craigie and A. J. Aitken, 4-8 by Aitken et al., 9-12 by Margaret Dareau et al.). 

1931 First fascicle (A-Aggle) of The Scottish National Dictionary (SND), ed. William Grant. 
SND would be completed in 10 volumes in 1976 (1-2 ed. Grant, 3 by Grant and David 
Murison, 4-10 by Murison), with a supplement ed. Iseabail Macleod in 2005. 

1933 The Shorter Oxford English Dictionary on Historical Principles published in two volumes 17 
February, ed. C. T. Onions, H. W. Fowler, and Jessie Coulson. A sixth edition, ed. Angus 
Stevenson, was published in 2007, 

1933 The Oxford English Dictionary, being a corrected re-issue with an Introduction, Supplement, 
and Bibliography of A New English Dictionary on Historical Principles, published in 
November, ed. Craigie and Onions, in twelve volumes (the Supplement had been made 
available to subscribers to the dictionary on 21 September). It would be reissued in micro- 
graphic form (four pages of the original to one page of the reissue) in two volumes in 1971, 
thereafter becoming readily affordable to very many readers. 

1934 Webster's New International Dictionary, Second Edition published 25 June, ed. William 
Allan Neilson, presenting 552,000 entries, with over 12,000 pictorial illustrations. 

1935 First edition of A New Method English Dictionary, ed. Michael West and James Endicott, 
the first significant monolingual learners’ dictionary of English (second edition 1965). 

1936 First fascicle of Dictionary of American English (i.e. of words originating in America or 
significant in American life, documented to about 1900), ed. William Craigie et al.; the 
2oth and last fascicle would appear in 1944, and the dictionary would be issued in four 
volumes in that year. This was the first of the period dictionaries proposed in 1919 to be 
completed; it was followed in 1951 by the two-volume Dictionary of Americanisms, ed. 
Mitford M. Mathews, which focuses more closely on words of American origin, and takes 
their history up to the time of compilation. 

1942 Idiomatic and Syntactic English Dictionary, ed. A. S. Hornby, E. V. Gatenby, and 
H. Wakefield, published in Tokyo; this dictionary would be reprinted photographically 
by Oxford University Press as A Learner’s Dictionary of Current English in 1948 and reis- 
sued as The Advanced Learner's Dictionary of Current English in 1952; this and successive 
editions had sold more than 14,000,000 copies by the early 1990s. The eighth edition was 
published in 2010. 

1950 First fascicle of Paul Robert, Dictionnaire alphabétique et analogique de la langue fran- 
¢aise. The first volume (A-C) would appear 15 October 1953 and the sixth and last (Recr-Z) 
September 1964, with a supplement 1970; third edition, ed. Alain Rey and Josette Rey- 
Debove, 2001, Le Petit Robert, a one-volume abridgement, ed. Rey (first edition 1967), is in 
widespread use. 

1950 First fascicle of Geiriadur Prifysgol Cymru: A Dictionary of the Welsh Language (a—ang- 
hynanadwy), the first full historical dictionary of Welsh. The 61st and final fascicle (ymlid- 
iaf—Zwingliaidd) was published in 2002. 12 fascicles of the second edition have been 
published and the dictionary went online at <http://gpc.wales> on 26 June 2014. 

1951 First volume of the major scholarly dictionary of Afrikaans, Woordeboek van die 
Afrikaanse taal, ed. Pieter Cornelis Schoonees, publeed 7 May; vol. 13 (R, ed. W. F 
Botha) was published 20009. 

1952 First fascicle (E-Endelonges) of the Middle English Dictionary. The dictionary would be 
completed in thirteen volumes, chief editors Hans Kurath (A-F), Sherman M. Kuhn with 
John Reidy (G-P), and Robert E. Lewis et al. (Q-Z), the whole comprising 54,081 entries, 
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supported by 891,531 illustrative quotations. The 115th and final fascicle of the alphabeti- 
cal sequence, X-Z, was published July 2001, followed by a revised Plan and Bibliography 
in 2007. 

1956 First volume (H) of Chicago Assyrian Dictionary / The Assyrian Dictionary of the Oriental 
Institute of the University of Chicago, ed. Leo Oppenheim, appears to very harsh criticism; 
the 2oth and last volume (U/W, ed. Martha T. Roth) would be published in the winter of 
2010-11, the 21st (Z) having already appeared. 

1961 Websters Third New International Dictionary of the English Language, Unabridged, 
published 28 September (with at least some entries released to the press as early as 6 
September), ed. Philip Gove; attacks on the supposed permissiveness of the dictionary 
began in early September, the most violent being published in 1962. 

1967 E.G. Cassidy and R. Le Page, Dictionary of Jamaican English published in or before June, 
drawing on four centuries of written record and on oral usage; a second edition would 
follow in 1980. 

1968 First fascicle (A-Calcitro) of Oxford Latin Dictionary, ed. P. G. W. Glare (8th and final 
fascicle published on schedule in 1982), offers a new record of the vocabulary of ancient 
Latin. 

1969 American Heritage Dictionary of the English Language published 15 September, edited by 
William Morris, with an etymological appendix by Calvert Watkins, notes on disputable 
items by a usage panel, and numerous photographic illustrations; fifth edition: November 
2011, ed, Steve Kleinedler. 

1971 First volume (A-Affiner) of the Trésor de la langue francaise, the fullest dictionary of 
modern French, published, ed. Paul Imbs (vols 1-7) and Bernard Quemada (vols 8-16); 
the 16th and last volume (Teint-Zzz) was published in 1994. The whole dictionary regis- 
ters 100,000 words, with 430,000 illustrative quotations. 

1972 First volume (A-G) of OED Supplement published 12 October, ed. R. W. Burchfield, to be 
followed by three more — H-N (4 November 1976), O-Scz (15 July 1982), and Se-Z (8 May 
1986) — comprising 69,300 entries, supported by 527,000 illustrative quotations. 

1984 First volume of the major scholarly dictionary of Frisian, Wurdboek fan de Fryske taal, 
published 9 November, ed. Klaas van der Veen. The 25th and last volume was published 11 
November 2011. 

1985 First volume (A-C) of Dictionary of American Regional English published early 
September, ed. F. G. Cassidy et al., and drawing on fieldwork with 2,777 informants in 
1,002 US communities, and on written sources. The last in the alphabetical sequence (SI- 
Z, ed. Joan Houston Hall) appeared in March 2012, with a final volume of maps, indexes, 
and questionnaire responses published in 2013. 

1986 First fascicle (D) of the Dictionary of Old English published, ed. Angus Cameron et al., 
registering 897 headwords on 951 pages in microfiche. It has been followed by microfiche 
and electronic publications of A-C and E-G. 

1989 Second edition of the OED published 30 March, ed. John Simpson and Edmund Weiner, 
in twenty volumes, comprising 291,500 entries, supported by 2,436,600 quotations. 

1996 Richard Allsopp, Dictionary of Caribbean English Usage transcends national boundaries 
to register the standardizing English of the Anglophone Caribbean as a whole. 

2000 OED Online launched 14 March. 

2002 Trésor de la langue francaise informatisé \aunched online 5 March. 

2009 Publication of the Historical Thesaurus of the OED, ed. Christian Kay, Jane Roberts, 
Michael Samuels, and Irene Wotherspoon. 
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Chambers, J. K. 596 

Chambers, Robert 15; see also Chambers 
dictionaries 

Chambers Thesaurus 445-6, 449, 451, 453 

Chambers'’s Twentieth Century Dictionary: see 
Chambers Dictionary 
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Chambers 21st Century Dictionary. see 
Chambers Dictionary 

Chambers, William 15; see also Chambers 
dictionaries 

Chambon, Jean-Pierre 340 

Champlain Society Digital Collection 210 

Chan, Alice Y. W. 132 

Chapman, Robert L. 327, 333, 379 

Chardonnens, L. S. 493 

Charniak, Eugene 456 

Chartier, R. 14 

Chauveau, Jean-Paul 343 

Chesterfield, Philip Stanhope, 4th Earl of 472, 
559, 551; 559 

children’s dictionaries, children and 
dictionaries, school dictionaries 7, 8, 13, 
15, 21, 130, 132, 153, 320, 323, 373, 458, 464, 
468-9, 471, 516, 521, 522, 532, 538, 552, 586, 
598, 599, 600, 603 

Chomsky, Noam 437 

Church, Kenneth W. 67, 68, 83, 412, 
432, 449 

citations: see quotations 

Clark, Cecily 278 

Clark, John 307 

Clarke, Stella Dextre 454 

Clinton, Henry Fynes 170 

Clyne, Michael 591, 595 

Coates, Richard 256, 289 

Cockeram, Henry 1, 609 

cognitive equivalence 149-50, 154 

cognitive linguistics 114, 376, 429, 433-6 

Cole, Ann 260, 265, 267 

Coleman, Julie 290, 327, 328, 330, 331, 332, 333; 
335» 337 

Coleridge, Derwent 547,549 

Coleridge, Herbert 169, 556, 612 

Coleridge, Samuel Taylor 610 

college dictionaries 16, 22 

Collet, Tanja 405 

Collinot, A. 15 

Collins 26, 557 

Collins, Beverley 293, 294 

Collins Canadian Dictionary 207, 212 

Collins Cobuild Advanced Dictionary: see 
Collins Cobuild English Language 
Dictionary 
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Collins Cobuild Advanced Learner's 
Dictionary: see Collins Cobuild English 
Language Dictionary 

Collins Cobuild English Dictionary: see Collins 
Cobuild English Language Dictionary 

Collins Cobuild English Language 
Dictionary 28, 29,34, 35,37-8, 41, 79, 83 93, 
102, 116-18, 122, 130-1, 138, 139, 432, 453, 460, 
467, 471, 496, 497, 585, 587 

Collins Cobuild Student's Dictionary 141 

Collins Concise English Dictionary 466, 497 

Collins Concise Spanish Dictionary 151-2 

Collins Dictionary and Thesaurus of the English 
Language 481 

Collins Dictionary of the English 
Language 14,16-17,18, 19, 21, 464-6, 

478, 481, 582 

Collins English Dictionary: see Collins 
Dictionary of the English Language 

Collins English Dictionary Express 466 

Collins Gem Dictionary 466 

Collins German College Dictionary 152,156 

Collins Junior Dictionary 468-9 

Collins Robert French Dictionary 46-7, 48,50, 
§7~-8, 151-2, 154, 158, 159, 511 

Collins School Dictionary 468-9 

Collins Student’s Dictionary 468-9 

collocations, collocates 27, 37-8, 39, 49-51, 
54, 60, 68, 82-3, 84, 85, 86, 87, 91, 92, 93, 118, 
122, 125, 130, 131, 140, 157, 411-24, 443, 506, 
526, 551 

colloquial language 18, 158, 184, 185, 194~5, 
201, 205, 317, 326, 327, 337, 386, 402, 440, 
448, 449, 489, 491, 494, 495, 498, 499, 548, 
551, 593s 594, 595, 598, 602; see also slang, 
register, standard and non-standard lexis 

colour, use of 22, 29 

commercial considerations 22, 24,33, 40, 42, 
66, 142, 175, 271-2, 333-5; 336; 356, 370, 381, 
562; 563-4, 571, 579-89 

Common Language Resource Infrastructure 
(CLARIN) 63 

componential analysis 428-9 

compositionality (semantic) 413, 418-21 

compounds 85, 230 

A Concise Dictionary of Modern Place-Names 
in Great Britain and Ireland 267 


Concise New Partridge Dictionary of 
Slang 470-1 

Concise Oxford Dictionary 17,19, 20, 21, 22, 27, 
106-7, 125, 466, 477, 478-9, 480, 497, 613 

Concise Oxford Dictionary of English 
Etymology 339, 346-7 

Concise Oxford Dictionary of English Place 
Names 265, 267 

Concise Oxford Dictionary of 
Mathematics 399, 400, 401, 402 

Concise Oxford Dictionary of Music 469-70 

Concise Oxford English Dictionary: see 
Concise Oxford Dictionary 

Concise Oxford Russian Dictionary 155, 157 

Concise Oxford Spanish Dictionary 150, 
1512, 159 

concordance lines 54, 80-81 

confusable items 136 

Conklin, Harold 429 

connotation 53, 138,148, 158, 364, 441, 448, 
449, 494,524 

Considine, John 166,169, 171, 175, 178, 189, 
237; 490,599 

construction grammar 118, 19 

contrast (in meaning relations) 450-6 

conversion (in word formation) 224-5, 
228-9, 536 

Convery, Cathal 573 

Cook, James 269 

Coote, Edmund 11 

Cop, Margaret 528 

copyright 74-5 

Corbeill, Anthony 353, 362, 364, 366 

core vocabulary 126, 131, 136, 295, 532 

corpora 1, 4,19, 28, 42, 43, 45, 53-6, 57, 62-93, 
103, 116~-22, 130~32, 153, 174; 199, 203-20, 235; 
354, 376, 401, 419, 422~4, 432, 433, 443, 453, 
454, 456, 496, 501-14, 529, 532; 533, 557,559 
565, 577) 581, 582 

Corpus de Referencia del Espariol Actual 
(CREA) 78 

Corpus del Espariol 174 

corpus design and construction 62-75 

Corpus Diacrénico del Espanol 
(CORDE) 174 

Corpus Encoding Standard for XML 
(XCES) 71 
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Corpus of Contemporary American English 
(COCA) 63, 215 

Corpus of Historical American English 
(COHA) 206, 215, 220 

Corpus of Late Modern English 205 

corpus tools 76-93 

Cortelazzo, Manlio 346 

Coseriu, Eugenio 429 

Cottle, Basil 279, 282, 285, 287 

Coulmas, Florian 592 

count nouns, countability 228, 460, 497 

Couturat, Louis 101 

Covarrubias, Sebastian 12, 165 

Cowie, Anthony P. 26, 27, 34, 49, 125, 126, 132, 
142, 411, 458, 460, 462, 523 

Craigie, Sir William 548-9, 612, 613, 614 

cross-references 39, 402, 406, 422,509, 570 

crossword puzzles 379 

crowd-sourcing 142 

Cruse, D. Alan 148, 444, 446 

cultural counterparts 154-6 

cultural information 512-31; see also 
encyclopaedic material 

cultural meanings 138 

Cuyckens, Hubert 433 

Czech National Corpus 62,84 


Dahlerup, Verner 171, 613 

d'Alembert, Jean LeRond 9 

Dalgarno, G. 372 

Dalzell, Tom 333,334 

DanNet 582 

DANTE 90, 424, 504, 566, 568, 575, 577 

Darmesteter, Arséne 428 

Dashkova, Princess Ekaterina 166 

data mining 69,582 

Davies, John 177 

Davies, Mark 63,78 

Davies, Matt 452 

dead languages (dictionaries of) 146, 166, 
195, 350-66 

Decesaris, Janet 585 

de Cillia, Rudolf 595, 597, 601 

decoding (as opposed to encoding) 26, 38, 45, 
49, 56, 138, 140-1, 144-5, 403, 405, 411, 462 

de-contextualization 9 

DeepDict Lexifier 84 


definiendum 96, 118, 131 

definiens 96, 131 

defining vocabulary 20, 25, 26, 27, 29, 30-1, 34, 
126, 131, 135, 137 

definitions 19~20, 31-3, 34-5, 90, 94-160, 
176-88, 193, 199, 200-2, 221-35, 401-3, 443, 
448-9, 451, 453-4 

Den Danske Ordborg 528,582 

denotation 53,148 

Depuydt, Katrien 173 

DeReKo 63,66, 83 

de Schryver, Gilles Maurice 41, 42, 43, 91, 153, 
454, 507, 521, 586 

descriptivism 8, 179, 301, 319, 396, 490-4, 498, 
546-60, 591 

de Tier, Veronique 173 

de Tollenaere, F. 490 

Deutsches Fremdwéorterbuch 342 

Deutsches Rechtswérterbuch 172 

Deutsches Referenzkorpus: see DeReKo 

Deutsches Worterbuch (DWB) 169-71, 174, 
427, 611-12 

de Vries, Matthias 170, 427, 612 

deWaC 63 

dialect (definition of) 384-5 

Dickey, Eleanor 361 

dictionaries for general readers 1-2, 4,7-24, 
66, 464 

dictionaries of proper names 9, 24, 
255-91, 339 

Dictionary of American Family Names 279, 
280, 282, 283, 285, 287, 288 

Dictionary of American Idioms 422 

Dictionary of American Regional English 172, 
383-4, 391, 615 

Dictionary of British Place-Names 266, 268 

Dictionary of British Surnames 278, 280, 286 

Dictionary of Canadianisms on Historical 
Principles 175, 204, 206-20, 599 

Dictionary of Classical and Theoretical 
Mathematics 395, 402 

Dictionary of English Surnames 278, 279, 280, 
281, 282, 284, 285, 286, 287, 289 

Dictionary of Family Names in Britain and 
Ireland 278, 279, 280, 281, 282, 283, 284, 287, 
288, 289 

A Dictionary of First Names 273, 274, 275 
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Dictionary of Historical and Comparative 
Linguistics 344 

Dictionary of Islamic Names 274 

Dictionary of Lake District Place-Names 266 

Dictionary of Lexicography 338, 339, 
344-5, 374 

Dictionary of Linguistics and 
Phonetics 546,556 

Dictionary of Muslim Names 272 

Dictionary of Newfoundland English 212, 
387, 596 

Dictionary of Nicknames 289 

Dictionary of Old English 173, 204, 220, 264, 615 

Dictionary of Prince Edward Island English 212 

Dictionary of Slang and Unconventional 
English 327-8, 333, 337 

Dictionary of Standard Slovene 92 

Dictionary of Surnames 279, 280, 281, 282, 283, 
284, 285-6, 287, 288 

Dictionary of the Irish Language 198 

Dictionary of the Place-Names of Wales 266 

Dictionary of Yorkshire Surnames 279, 282, 
284, 287, 288 

Dictionary Writing Systems 561-78 

Dictionnaire de VAcadémie francaise 166,398, 
551, 609 

Dictionnaire des anglicismes 342-3 

Dictionnaire des emprunts au russe dans les 
langues romanes 343 

Dictionnaire du francais contemporain 21 

Dictionnaire Etymologique deVAncien 
Francais 345, 348 

Dictionnaire Etymologique Roman 340 

Dictionnaire phonétique de la langue 
francaise 293, 295 

Diderot, Denis 9 

Diehr, B. 586 

differentiae 19, 99, 108, 115, 128, 401, 445 

The Digital Exposure of English Place-Names 
(DEEP) project 269-70 

Digitales Worterbuch der Deutchen Sprache 
(DWDS) 84,528 

diSciullio, Anna Maria 421 

discontinuity of use 242, 245-6 

discrete classification (in grammatical 
analysis) 222 

distributional semantics 429 


Dixon, James 549 

Dixon, R. M. W. 225, 232 

Dizionario etimologico della lingua 
italiana 346, 347-8 

DK Illustrated Oxford Dictionary 464-6 

Dobrovol’skij, Dmitrij 160, 421 

Dobson, E. J. 251 

document clustering 69 

Doke, C. M. 153 

Dollinger, Stefan 203, 204, 206-14, 215, 596, 
599, 603 

domain (subject or topic) 68, 69-70, 77, 88, 
185-6, 205; 368, 377, 393-407, 490, 500 

Dornseiff, Franz 379, 430 

Dryden, John 553-4 

DTD 91,564, 567-9, 575, 577 

Duden, Konrad 319 

Duden—Das Bildworterbuch 521 

Dunning, Ted 432 

Durkin, Philip 172, 237, 238, 244-6, 342, 347 

Diirscheid, Christa 311 

Duval, A. 47,52 

DWDS corpus 62 

Dyche, Thomas 558 

Dziemianko, Anna 42,587 

Dziubalska-Kolaczyk, Katarzyna 302 


Early Canadiana Online 210 

Early English Books Online (EEBO) 174,190-1, 
204-5, 250; 540, 544 

Early English Text Society 189 

editing software (for dictionary 
production) 561-78 

Edmonds, Philip 433,508 

Eickmans, Heinz 170 

Eighteenth-Century Collections Online 
(ECCO) 174, 190-1, 204, 250,540, $44 

Eisenberg, Peter 310-11, 318 

Eisenreich, Gtinther 396 

Ekwall, Eilert 267, 286 

electronic data sources 190, 203-20; see also 
corpora 

electronic dictionaries and electronic 
publication 22-3, 25, 26, 27, 29, 32, 40~1, 
42-3, 45, 142, 173, 175, 288-9, 307-8, 335, 380, 
407, 422-3, 435, 454-6, 497-8, 501-14, 517, 
521-2, 579-89 
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elexiko 523, 528, 529, 530 

elliptical uses 230-1, 239 

Ellis, Alexander J. 475, 483 

elovivo 522 

Emblen, D. L. 379 

Emsley, Bert 293, 296, 305, 307, 309 

Encarta World English Dictionary 21 

encoding (as opposed to decoding) 20, 25, 26, 
27; 35s 37: 38; 45» 49; 51-2, 56, 60, 138, 140-1, 
144-5, 374-5, 403, 405, 411, 462 

encoding (in corpus construction) 71-3 

Encyclopaedia Britannica 9,22, 584 

encyclopaedias 8-9, 15, 22, 24, 524, 527, 584 

encyclopaedic material (in 
dictionaries) 13-14, 15, 16, 17, 18, 158, 394, 
401, 434, 436, 515-31 

endangered languages 4 

Enderle, Ursula 311 

Endicott, James 26, 614 

end matter: see back matter 

Engelberg, Stefan 528 

English Dialect Dictionary 171-2, 195, 382-3, 
384, 385, 388, 390, 613 

English Dialect Society 382, 390 

English Field-Names: A Dictionary 267 

English Inn and Tavern Names 267 

English Place-Name Society 257, 260, 266, 270 

Enlightenment (European) 9,94 

equivalence 52-3, 61, 146, 148-54, 403-4 

Esling, John 306 

Esposito, J. 581,584 

Estienne, Henri 165-6, 178, 608 

Estienne, Robert 10, 165-6, 178, 608 

etymological dictionaries 24, 165, 236, 
23% 338-49 

Etymological Dictionary of Egyptian 343 

Etymologisches Wérterbuch des 
Ungarischen 345 

etymology 19, 21,166, 172, 236-52, 259-60, 
261, 264, 267, 268, 271, 273; 275-7, 278, 281-6, 
288-9, 317, 319, 323, 335 338-49, 362, 364-5, 
382-3; 386, 387, 457-71, 488, 512, 533, 536 537 
538, 548, 582, 606, 609, 612, 613-4, 615 

European Language Resource Association 
(ELRA) 63 

Euzenat, Jéré6me 513 

evaluative meanings 138 


Evans, Vyvyan 433 

Evert, Stefan 71,72, 412 

Everyman's Dictionary of First Names 272, 275 

examples, use of 4, 13, 19, 20, 22, 25, 29, 30, 
34-5, 39; 54> 76, 90, 134-5, 138, 157; 159; see 
also quotations 

explanation (as alternative term to 
definition) 123; see also definitions 

explanatory equivalence 150-4 

Extensible Stylesheet Language 
Transformations (XSLT) 72 

extension (as opposed to intension) 148 


Faber, Pamela 435 

Fachwortschatz Medizin Englisch 406 

false negatives 71 

Farmer, John Stephen 328, 332 

Fazly, Afasaneh 412 

Felber, Helmut 396 

Fellbaum, Christiane 108, 368, 419, 420, 422, 
444, 446, 454, 582 

Ferret, Olivier 510 

Fichte, Johann Gottlieb 592 

Field, John 258 

figurative uses 186-8, 200, 248, 362, 364 

Fillmore, Charles J. 56, 118, 119~21, 415, 416, 
417, 418, 419, 434 

Firth, John R. 49, 133, 412, 429 

Fischer, Andreas 369, 379 

Fischer, Kerstin 437 

Fishman, Joshua A. 592 

Flexner, Stuart Berg 332 

Florio, John 10, 608 

Fodor, Jerry A. 429,442 

Fontenelle, Thierry 27, 50,177,511 

fonts 22 

foreignisms 18 

Forssman, Bernhard 340 

Fox, Gwyneth 90 

frame semantics 56-9, 119-21, 434 

FrameNet 119, 435 

Franklyn, Julian 289, 290 

Fransson, Gustav 278 

Franzésisches etymologisches Wérterbuch 340, 
342, 343, 345; 346, 347, 613 

Frege, Friedrich Ludwig Gottlob 102, 103, 
108, 115 
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frequency 28, 29, 54, 67, 76, 81-2, 89, 140, 
174, 288~9, 496, 506, 557, 602; see also 
high-frequency words 

Freund, Wilhelm 355, 611 

front matter 39-40, 296, 340, 399, 527, 528, 587 

frWac 63 

Fuertes-Olivera, Pedro A. 406 

Fuhrhop, Nanna 311, 312, 313 

function words 19, 225-6 

functional equivalence 150-4 

Funk, Robert 546 

Funk & Wagnalls dictionaries 486, 613 

Furetiére, Antoine 13,166 

Furnivall, F.J. 189, 612 


Gaelic Personal Names 272 

Gage Canadian Dictionary 597,598, 600 

Gallmann, Peter 322 

games 41 

Gantar, Polona 92 

gazetteers 256-7, 265, 266, 268 

Geeraerts, Dirk 114~5, 425, 431; 433, 436, 443 

Geiriadur Prifysgol Cymru: A Dictionary of the 
Welsh Language 176, 177, 180-202, 614 

Gelling, Margaret 260, 265, 267 

Gelman, Susan 468 

general dictionaries 11-14; see also 
dictionaries for general readers 

generative grammar 429, 433 

genre 65,199, 205, 317, 357 

genus (intaxonomy) 110-12 

genus word 19, 99, 108, 128, 401-2, 445 

German Reference Corpus: see DeReKo 

gerunds 234-5 

Geyken, Alexander 62, 422 

Gibbs, Henry Hucks 556 

Giesbrecht, Eugenie 71 

Gigafida 84 

Gilliéron, Jules 344 

Gimson, A. C. 294, 302 

Glinz, Hans 430 

glossaries and collections of glosses 9, 44, 
377, 605-6 

Glynn, Dylan 437 

Goddard, Cliff 105-7, 430 

Goethe Worterbuch 172 

Goldberg, Adele E. 118 


Good Dictionary EXamples (GDEX) 90 

Goodenough, Ward H. 429 

Goodrich, Chauncey 548, 555, 611 

Google 584, 588 

Google Books 174, 191, 540, 542, 543 

Gorlach, Manfred 602, 603 

Gove, Philip 494; 496, 556 

grammar 15, 19, 20, 24, 25, 26-7, 28, 30, 35-6, 
48, 84, 88, 94, 137, 184, 221-35, 325, 351-2, 356, 
361, 411, 457-71 

Grammar of English Words 26-7 

grammatical change 234-5 

grammiaticalization 225 

Granger, Sylviane 585 

Great Vowel Shift 473-4, 478 

Greek-English Lexicon (Liddell and 
Scott) 169, 170, 357; 360, 361, 364, 611 

Green, Jonathon 172, 330, 470-1, 540 

Green, Melanie 433 

Greenbaum, Sidney 232 

Grefenstette, Gregory 217, 506, 581 

Gregg, Robert J. 598, 599, 600 

Grice, Paul 347 

Gries, Stefan T. 437 

Grimm, Jacob 14, 169, 170, 427, 611 

Grimm, Wilhelm 14,169,170, 427, 611 

Grimshaw, Jane 414, 415 

Grose, Francis 330 

Gross, Derek 453 

Grundy, Valerie 54-6, 566, 572 

The Guardian 66 

‘guide words’ 28, 29, 39 

Giinther, Hartmut 311 

Guppy, HenryBrougham 287 

Gurevych, Iryna 580 


Hacker, Martina 480, 487 

Hadar, Linor 40, 60 

Hall, Fitzedward 554 

Hall, G. 580 

Hallig, Rudolf 371, 379, 430 

Halskov, J. 583 

Hannah, Jean 596 

Hanks, Patrick 45, 83,101, 115, 117, 123, 131, 146, 
256, 272, 273, 278, 279, 283, 285-6, 287, 289, 
381, 402, 412; 432, 436, 507 

Hansen, Steffen Stummann 260 
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Harbsmeier, Christoph 168 

‘hard words’ 3, 11, 12,17, 24, 44, 488~9, 608-9 

Hardin, Ron 449 

Harris, Roy 448 

Harris, Zellig 429 

Harteveld, P. 175 

Hartmann, Reinhard 30, 60, 338, 374 

Harvey, Anthony 355 

Hatzfeld, Adolphe 428 

Haugen, Einar 590,597 

Hausmann, Franz-Josef 145,395, 396, 499, 591 

Haviland, John 4 

Hays, Gregory 352, 353, 355, 360; 364 

Hearst, Marti 456 

Heath, Tom 511, 512 

Hecht, Max 427 

Heider, Eleanor Rosch: see Rosch, Eleanor 

Helsinki Corpus of English Texts 206 

Henderson, John 355 

Henley, William Ernest 328 

Henry, Julie 552-3 

Herbst, Thomas 25, 28, 29, 30, 39 

Herder, Johann Gottfried 169,592 

Herold, Axel 419 

Herrmann, Douglas J. 450, 452 

heteronyms 296, 461, 465, 467, 469 

Heuberger, Reinhard 31, 383 

Hey, David 282, 287, 288 

Heylen, Kris 456 

Higden, Ranulf 164 

high-frequency words 137; see also frequency 

Hillen, Michael 355, 356 

Historical Gazetteer of England’s 
Place-Names 270 

historical principles 163-4,167-75, 
236-52, 610 

Historical Thesaurus of the Oxford English 
Dictionary 173,188, 369-73, 375, 379, 
430, 615 

historical-philological semantics 426-8 

history oflexicography 3-4, 8-17, 23-4, 26-9, 
44, 293-4, 354-5, 367, 376-80, 427-8, 430-3, 
472-87, 488-500, 605-15 

Hodges, Flavia 273, 279, 282-3 

Hoey, Michael 84 

Hoffman, Katja 456 

HOfler, Manfred 342 


Hoger, Rainer 522 

holonyms 440, 446 

homonymy 33, 237-42, 249, 268, 282, 287, 296, 
457~71, 4735 509 

Hornby, A. S. 25, 26, 27; 34; 42, 125, 126, 614 

Hotten, fohn Camden 330 

Hudson, Richard 546,549 

Hueber Learner's Dictionary German-English/ 
English-German 528 

Hueber Worterbuch Deutsch als 
Fremdsprache 520,528 

Hiillen, Werner 11, 369, 372; 376; 377; 378, 3795 
397, 430 

Humbley, John 396 

Humboldt, Alexander von 173 

Hupka, Werner 516, 517, 518, 519, 521, 524, 
526, 527 

hybrid dictionaries: see bilingualized 
dictionaries 

hypernyms 443; see also hyperonyms 

hyperonyms 21, 157, 440-6, 450, 451, 510 

hyponyms, hyponymy 21,99, 157,378, 439-56 


Ide, Nancy 71, 502,508, 509, 513 

ideophones 153 

Idiomatic and Syntactic English 
Dictionary: see Oxford Advanced Learner's 
Dictionary of Current English 

idioms 18, 27, 49, 54,107, 117, 160, 184, 226, 232; 
248, 339, 344,385, 411, 413, 415-24, 453, 516; 
538, 577 

IDM 565 

Thre, Johann 165 

Illustrated Oxford Dictionary 520 

illustrations, inclusion of 21, 33-4, 39, 132-3, 
160, 334, 515-31 

Ilson, Robert 133,516 

The Imperial Dictionary of the English 
Language 14, 611, 613 

inclusion (decisions about) 532~45, 566; see 
also wordlist, selection of 

index terms (in learners’ dictionaries) 28,29 

Indo-European Etymological Dictionaries 
Online 341, 614 

Indogermanisches etymologisches 
Worterbuch 341, 613-14 

inflections 26, 35, 94, 191-2, 2278, 299, 359, 361 
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information graphs 511-12 

Insley, John 278 

intellectual property 506,512 

intension 148 

interjections 139, 223, 227 

International Dictionary of Medicine and 
Biology 399, 400 

IPA (International Phonetic Alphabet) 20, 
292, 293, 294, 303, 305, 309, 383, 479, 481, 
483, 484, 485-6, 487 

Irish Names of Places 266 

Isidore of Seville 9, 606 

isomorphism 147 

ITP Nelson Canadian Dictionary 207, 600 

itWaC 63 


Jackendoff, Ray 411, 430 

Jackson, Howard 35, 40,178, 402 

Jacobs, Joachim 311 

James, Arthur Lloyd 484 

James, Gregory 338, 374 

James, Patrick 364 

James, Terrence 260 

Jamieson, John 167-8, 169, 170, 610 

Jarvad, P 583 

Jespersen, Otto 305 

Joffe, David 91 

Johnson, Christopher R. 435 

Johnson, Mark 136, 419, 434 

Johnson, Samuel 3, 13, 14, 94-8, 130, 169, 177; 
178, 249-50, 372, 472-3, 475, 482, 486, 489, 
490, 492-3, 550, 551, 553-4; 555s 556; 5585 
578, 610 

Jones, Daniel 293, 295, 302 

Jones, E. D. 177 

Jones, Steven 443, 452, 453, 456 

Jones, William 170 

Junius, Franciscus 165 

Jurafsky, Daniel 412 


Kachru, Braj 31,595 
Kahane, Henry 31 
Kalverkamper, Hartwig 395 
Karypis, G. 69 

Katz, Jerrold J. 429, 442 
Kay, Christian 372, 375, 615 
Kay, Paul 118, 417 


Kaye, Patricia 558 

Kearns, Kate 414 

Kemmer, Katharina 517, 521 

Kemp, A. 303 

Kendall, Joshua 378, 379 

Kennedy, Alistair 455 

Kenney, E. J. 363 

Kenrick, William 480-81 

Kernerman Publishing 59 

Kersey, John 12, 249, 251, 609 

keywords 82 

Kilgarriff, Adam 28, 45, 55, 76, 78, 82, 84-5, 87, 
89, 90, 92, 217, 423, 42.4, 436, 506, 507, 508, 
510, 565, 580, 582, 584, 587 

Kimhi, David 165 

Kimmel, Michael 584 

Kimura, Makimi 194 

Kipfer, Barbara Ann 34, 38 

Kirkness, Alan 169 

Kirkpatrick, Betty 39 

Klein, Wolf Peter 321, 322 

Klein, Wolfgang 422 

Klepousniotou, Ekaterini 457 

Klosa, Annette 523,529, 530 

Kloss, Heinz 591, 593 

Kneen, J. J. 278, 281, 282 

Knowles, Elizabeth 174 

Koch, John T. 179 

Kohn, Amy 5S. 468 

Kohrt, Manfred 313 

Kolb, G. J. 13 

Kolln, Martha 546 

Konopka, Rafal 484 

Koplenig, Alexander 580 

KorpusDK 78 

Kosem, Iztok 85, 90, 92, 585 

Kosem, Karmen 585 

Koskela, Anu 456 

Kovaf, Vojtéch 85 

Kramer, Johannes 341 

Krassnig, Albert 601 

Krek, Simon 92, 580, 584 

Kretzschmar, William 294, 384, 484, 596 

Krishnamurthy, Ramesh 78, 90 

Kristiansen, Gitte 433 

Kiibler, N. 585 

Kuéera, Antonin 400 
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Kuhn, Sherman M. 185 
Kupietz, Marc 63, 66 
Kwary, Deny Arnos 402 
Kyt6, Merja 206 


labels 21, 28, 37, 53,56, 88-90, 157, 158-9, 183, 
185-6, 401-3, 406, 488-500, 564 

Labov, William 596 

Lachatre, Maurice 15 

Ladefoged, Peter 307 

Ladusaw, William A. 305 

Lahey, Anita 601 

Lakoff, George 114,115, 136, 419; 434, 435 

Lancashire, lan 249 

Landau, Barbara 468 

Landau, Sidney I. 16, 21, 22, 30, 31, 37 90,125, 
203, 205; 398; 399; 401, 402, 464, 465, 486, 
498, 516, 524, 562, 600 

Langenscheidt Taschenworterbuch Deutsch als 
Fremdsprache 517,520 

language academies 12, 13, 166, 550 

language planning 590 

Lannoy, V. 587, 588-9 

Lara, Luis Fernando 524 

Larousse dictionaries 13, 15, 16, 18, 19, 21, 
23 527 

Larousse, Pierre 15, 23 

Lass, Roger 307 

Lasselsberger, Anna Maria 313 

A Latin Dictionary (Lewis and Short) 355, 356, 
360, 611 

Laufer, Batia 40, 60, 584 

layout 22 

Lea, Diana 87 

Leacock, Claudia 510 

learners dictionaries 1-2, 4, 21, 25-43, 66, 82, 
116, 122, 123-43, 374; 466-8; 497; 504, 506; 
520, 528, 530-1, 565, 579 

Learner's Dictionary of Current English: see 
Oxford Advanced Learner's Dictionary of 
Current English 

Lebeaux, David 416 

Legman, Gershon 328 

Lehrer, Adrienne 256, 442,460 

Leibniz, Gottfried Wilhelm 94,99, 101,103, 
108, 115 

lemmatization (in corpora) 77 


Lenin, Vladimir Ilyich 15 

Le Page, R. B. 172, 615 

Leroyer, P. 581 

Lesk, Michael 502, 507 

Lessico etimologico italiano 340,342, 345, 348 

less-used languages 4 

Levine, C. 584 

Lew, Robert 141,504, 516 

Lewis, Charlton T. 355, 611 

lexical databases 504-14 

lexical fields 31, 33, 378, 428-31, 434 

lexical gaps 147,151 

lexical innovations (in explaining meaning in 
bilingual dictionaries) 154-6 

lexical knowledge bases 502, 505, 510, 513 

Lexical Mark-up Framework (LMF) 503 

lexical sets 79 

lexicalization 147 

LexisNexis Academic 203, 215 

LHomme, Marie-Claude 396, 406 

Li, L. 579 

Liberman, Anatoly 344 

licensing of data 73-5, 586 

Lichte, Timm 417 

Liddell, Henry 169,170, 357, 611 

Lighter, Jonathan 172, 327 

light verb constructions: see support verb 
constructions 

Linde, Samuel! Gottlieb 171, 610 

Linguistic Data Consortium (LDC) 63 

Linked Data initiative 511 

linking electronic dictionaries 173,175 

Linnaeus, Carl 108, 110-12,170 

listemes 421 

Little Oxford English Dictionary 466, 480, 484 

Littré, Emile 14, 20, 170, 427, 611, 612 

Lloyd, Thomas 177 

Lloyd, William 12, 609 

loan translations 155-6, 164 

loanwords 155-6, 172, 193-4; 237-44; 247; 
2.48, 252, 300, 340, 342~3, 346-7, 347-8, 356, 
458-9, 473-4, 488, 490-1, 537; 548 

Lobanova, Anna 451, 456 

Location Lingo 269 

log likelihood 83 

logDice 83 

Longman. 17, 378; see also Pearson Longman 
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Longman Dictionary of Contemporary 
English 27, 28, 29, 30, 32, 33~4, 35> 36, 37-8; 
126-7, 128, 132, 133-4, 137, 138, 140, 142, 443, 
467, 497, 511, 518, 521-2, 527, 528, 529, 585, 587 

Longman Dictionary of English Language and 
Culture 32, 527-8 

Longman Dictionary of the English 
Language 495-6 

Longman Exams Dictionary 79 

Longman Interactive English Dictionary 32 

Longman Language Activator 38 

Longman Lexicon of Contemporary 
English 374,379, 443 

Longman Pronunciation 
Dictionary 292-309, 470 

Longman Stownik Wspédlczesny 
Angielsko-Polski, Polsko-Angielski 157 

Lounsbury, Floyd 429 

Ludewig, P. 586 

lumping 46,184, 330, 370, 381, 457 

Lynch, Jack 177 

Lyons, Sir John 148, 256, 429, 431, 442, 444, 
445, 460 


Maaler, Josua (Pictorius) 10 

machine learning 69 

machine-readable dictionaries 502-14 

machine translation 505, 583 

Mackin, R. 49 

MacLysaght, Edward 278, 279, 282 

MacMahon, Michael K. C. 293, 476, 483 

Macmillan 26 

Macmillan English Dictionary for Advanced 
Learners 29, 35, 36, 38, 39, 46, 52-3, 127, 132, 
135, 136, 138, 139, 140, 142, 435, 586 

Macmillan English Dictionary Online 90, 587 

macrostructure 310, 314-16, 320, 324, 369-70, 
372; 379; 399-400, 457, 462, 510, 577 

Mahecha Mahecha, Viviana 585 

Mair, Christian 205, 217 

Mair, Victor 583 

Malakhovski, L. V. 460, 462, 468 

Malkiel, Yakov 339-40, 344, 345, 348, 349, 464 

Malmgren, Sven-Goran 171 

management (of dictionary projects) 561-78 

Mangold, Max 293, 297, 303 

Manning, Christopher D. 505 


Maps 21, 309, 383, 527, 584, 615 

Marconi, Diego 255 

Marello, Carla 59 

Markus, Manfred 383 

Martin, Robert 397 

Martinez Motos, Raquel 395 

mass nouns 228, 460 

Matisoff, James A. 341 

Matoré, G. 15 

Matsell, George 330, 331 

Maurer, David W. 329-30, 331 

Mayhew, Henry 385 

Maziére, F. 15 

McArthur, Tom 9, 11, 15, 372-3, 379, 443, 455, 
550; 551; 579, 582, 584, 589 

McCarthy, Michael 79 

McClure, Peter 282, 283, 286, 289, 290 

McDavid, Raven I. 330, 384 

McDermott, Anne 490 

McEnery, Anthony 412 

McKean, Erin 582 

McKinley, Richard A. 282, 288 

meanings, order of presentation of 19 

Meaning-Text theory 431 

Mees, Inger M. 293, 294 

megastructure 18 

Meisenburg, Trudel 312 

Meléuk, Igor A. 339, 431 

Ménage, Gilles 165, 609 

Mentrup, Wolfgang 320, 322 

‘menu words’ 39 

Mercer, Robert 67 

mergers (historical, of previously distinct 
lexical items) 246-7 

meronymy 46,378, 439-56 

Merriam-Webster 16, 26,121, 319, 453-4, 478, 
485-6, 502, 557, 579, 587; 611 

Merriam-Webster’s Advanced Learner's English 
Dictionary 29, 33,35, 36, 38 

Merriam-Webster’s Collegiate Dictionary 18, 
19, 20, 21, 485 

Merriam-Webster New International 
Dictionary 478, 485, 613 

Mester, Arnim 414, 415 

metadata 68-71, 77, 501-14 

metalanguage 47-52, 53, 488-500 

meta-metadata 69 
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Nolan, Francis 306 

Norman, Guy 403, 405 

Norri, Juhani 172, 495, 496, 498 
Nowell, Lawrence 165 
Niibling, Damaris 311 
Nunberg, Geoffrey 418, 419 
Nyrop, Kristoffer 426 


Oakes, Michael 83 

obsolete words 182-3, 448, 533, 538 

Ockham’s razor 122 

O'Connor, Mary Catherine 1:8 

offensive material (inclusion of) 18 

Office for National Statistics 273 

Ogden, C.K. 113 

Ogilvie, John 14, 611, 613 

Ogilvie, Sarah 4, 491 

OHare, Cerwyss 372 

OneLook 580-81 

Onions, C. T. 221, 491, 612, 614 

online dictionaries: see electronic publication 

ontologies 454,502, 504, 509-12 

Onysko, Alexander 383 

Opitz, Kurt 395 

Ordbog over det danske Sprog 171, 613 

Ordnance Survey 266, 270 

orthography: see spelling and orthography 

Orton, Harold 195, 388 

Osselton, Noel 13, 169, 170, 182-3, 192-3, 
489, 495 

ostensive definition 97,160 

Osterreichisches Wérterbuch 597-8, 599, 601, 
602, 603 

Oxford Advanced Learner's Dictionary of 
Current English 25, 29, 30-1, 34, 35, 36, 
38, 42, 98, 116, 124, 125-6, 128, 134, 135-6, 
137) 138, 139; 142, 391, 467, 502, 520, 532, 
587, 614 

Oxford American Writer’s Thesaurus 442, 456 

Oxford BBC Guide to Pronunciation 298,300, 
302; 304, 305 

Oxford Children’s Colour Dictionary 468-9 

Oxford Children’s English Corpus 552 

Oxford Collocation Dictionary for Advanced 
Learners of English 423 

Oxford Collocations Dictionary for Students of 
English 422 


Oxford Dictionaries Online 512, 539,541, 544, 
545, 582 

Oxford Dictionary of English 19, 111,116, 123, 
255, 445, 446, 464-6, 497, 538, 539, 554, 559, 
568, 586 

Oxford Dictionary of English Christian 
Names 272,275 

Oxford Dictionary of English 
Etymology 345,347 

Oxford Dictionary of National Biography 175 

Oxford Dictionary of Pronunciation for 
Current English 292-309, 470, 484 

Oxford Dictionary of Synonyms and 
Antonyms 453 

Oxford English Corpus 82, 83, 204, 422, 532, 
534s 5395 541; 542, 543, 544 

Oxford English Dictionary 2,12, 14, 98, 116, 121, 
156, 163, 169, 170-1, 172-4, 175, 176, 178-202, 
221-35, 236-52, 255, 257; 261, 264, 270, 284, 
293, 305, 319, 326, 329, 331, 336, 338-9, 344, 
347, 369, 375, 382, 385, 389, 391, 400, 401, 427, 
483, 490-500, 525, 526, 532-45; 547-60, 561, 
582, 585, 612, 614, 615 

Oxford German Dictionary 156 

Oxford-Hachette French Dictionary 46, 48,50, 
51, 52, 404 

Oxford IsiZulu-IsiNgisi/ 
English-Zulu Isichazamazwi Sesikole/School 
Dictionary 153 

Oxford Junior Dictionary 552 

Oxford Latin Dictionary 355, 356, 357, 360, 361, 
363, 365, 366, 615 

Oxford Learner's Thesaurus 87 

Oxford Primary Dictionary 468-9 

Oxford School Dictionary 468-9 

Oxford Text Archive 203, 205 

Oxford University Press 17, 25, 26, 27, 116, 289, 
355, 385, 484-5, 579) 586 


Padel, Oliver 263 
Palmer, H.E. 26, 125 
Pantel, Patrick 456 
PaperofRecord.com 205 
Paquot, Magali 585 
paradigmatic relations 21 
Paradis, Carita 453 
Pardon, William 558 
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Parish, J. 586 

Partington, Alan 412 

part of speech 19, 35, 48, 70-1, 221-35, 460-1, 
464-5; see also grammar 

Part-of-Speech taggers 70-1, 77,508 

Partridge, Eric 327-8, 331, 333 

Passow, Franz 168-9, 170, 555~6, 559, 610 

Passy, Paul 293, 303 

Pastor, Verénica 503 

Patkar, Madhukar M. 165 

Paul, Hermann 426, 428 

Pearsall, Judy 116 

Pearson, Jennifer 398, 404, 405 

Pearson Longman 26 

Peckham, Aaron 335 

Peeters, Bert 524 

Peirce, C. S. 516 

Peng, Jing 403, 404 

Penguin Book of Hindu Names 272 

Penguin Dictionary of British Surnames 279, 
281, 283, 285, 287, 288 

Penguin Dictionary of First Names 275, 276 

Penguin Dictionary of Surnames 279, 283, 285 

Penguin English Dictionary 478, 479-80 

Penhallurick, Robert 382, 383 

Pennacchiotti, Marco 456 

Personal Names of the Isle of Man 278, 
283, 284 

Peters, Jorg 311, 312 

Pfeiffer, Rudolf 165 

Pfister, Max 340 

Phillips, Edward 12,249, 489,609 

The Philological Society 163, 178, 188, 483, 547, 
612, 613 

phrases, phraseology (treatment of in 
dictionaries) 18, 27, 28, 81, 85, 96, 101, 102, 
107, 117-9, 122, 126, 127, 131-2, 133-4, 226-7, 
232~3, 405, 411-24, 551 

Piirainen, Elisabeth 160 

Pijnenburg, W. 490 

The Place-Names of Rutland 261-2 

plagiarism 330-32 

plurals, treatment of 227-8, 248 

pluricentric languages 591-603 

Pocket Oxford Dictionary 17,19, 480, 484 

Pohl, Heinz Dieter 597, 603 

Polish National Corpus 62, 84 


polysemy 28, 30, 33; 39, 45, 46, 47, 150, 157, 158, 
249; 364, 374; 396, 397; 405, 434, 435-6, 441, 
457-71; 509 

Popkema, Anne 4 

Porphyry of Tyre 99 

Porzig, Walter 429 

Pottier, Bernard 429 

Pougens, Marie-Charles-Joseph de 167 

pragmatics 37, 102, 129, 130, 139, 160, 400, 415, 
433, 496, 500, 504 

Pratt, T. K. 212 

Praxmarer, Christoph 383 

Prentice, Norman N. 468 

prescription, prescriptivism 12, 401, 403, 489, 
490, 491-3, 498, 546-60, 597 

primordial sample (in corpus 
construction) 66-8 

Prinsloo, Danie 4, 203 

Proceedings of the Old Bailey Online 206 

Proctor, Paul 27 

production: see encoding 

Project Gutenberg 203, 205 

Project MUSE 203 

Promptorium Parvulorum 10, 607 

pronouncing dictionaries 292-309, 469-70 

A Pronouncing Dictionary of American 
English 294 

pronunciation 19, 20, 26, 27, 38, 40, 316, 317, 
323, 325; 326, 292-309, 472-87, 512, 585 

proper names (as evidence in historical 
dictionaries) 192,201 

proper names (inclusion of in non-onomastic 
dictionaries) 17, 255-6, 297, 298, 299, 339 

ProQuest Historical Newspapers 205, 215, 
540,544 

prototypes (semantic), prototype theory uo, 
112-6, 376, 433-4 

Pruvost, J. 15 

Przedlacka, Joanna 302 

Przepiérkowski, Adam 62 

Ptaszyniski, Marcin Overgaard 498 

Pughe, William Owen 178 

Pujol, Didac 40 

Pullum, Geoffrey K. 305 

punctuation, guidanceon 40 

Pustejovsky, James 103-5, 430; 458 

Putnam, Hilary 113, 114, 115 
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qualia 103-5 

Quine, Willard van Orman 19, 97 

Quirk, Randolph 27 

quotations 13, 15, 16, 19, 20-1, 95, 97, 98; 135; 
163, 166, 167-8, 170, 171, 174-5; 176-202, 250, 
330; 3.42, 355» 357s 535» 539s 548) 5495 555-75 
610, 612; see also examples 


Radetzky, Paula 4 

Ramson, William 557 

Random House Dictionary of the English 
Language 464-6 

Ray, John 372; 609 

Rayburn, Alan 266, 267 

Read, Allen Walker 12 

reading programmes 188, 539 

Real Academia Espafiola 12, 610 

Reaney, P.H. 278,279, 280, 282, 283, 285, 
286-7, 288 

Received Pronunciation 302, 475-6, 484 

reception: see decoding 

Rechtschreib-Duden 310, 312, 314-24, 598, 603 

Rechtschreib-Wahrig 310, 314-24 

Das Rechtschreibwérterbuch (Ickler) 323 

Redmonds, George 275, 278, 280, 282, 284, 
287, 288, 290 

reference (as opposed to sense) 102 

reflexive uses 233 

regional usage, regional varieties, regional 
variation 88, 171-2, 194-5, 206-20, 300, 
303, 357s 361; 381-92; 443,448, 490, 495, 500; 
535, 540, 590-603; see also standard and 
non-standard lexis 

register 48, 56, 65, 68, 88, 103, 158, 185-6, 
194-5, 205; 238, 325-375 303; 357; 448, 449; 
488-500, 535,540 

Reichmann, Oskar 395 

Reimer, Marga 256 

relatedness 21 

relational semantics 429 

Rena, Irene 585 

Rennie, Susan 167 

representativeness (in corpora) 63-4, 
203-4, 354 

Resource Description Framework (RDF) 503 

Rey, Alain 9,524 


Richards, I. A. 113 

Richardson, Charles 610-11 

Robert dictionaries 16,18, 19, 20-1, 23, 343; 614 

Robins, R.H. 458, 462 

Rogers, Colin D, 288 

Roget, Peter Mark 108, 109, 368, 369, 370, 371, 
375s 377 37 8-9» 380, 455; 606, 612 

Roma 72 

Romanisches etymologisches Wérterbuch 340 

Romary, Laurent 513 

Room, Adrian 267, 276, 290, 453 

Rosch, Eleanor 113,114, 115, 444 

Ross, Alan §. C. 386 

Rossenbeck, Klaus 404, 405, 406 

Ruano-Garcia, Javier 383 

Rubery, Polly 288 

Rumble, Alexander R. 258 

Rundell, Michael 26, 42-3, 65, 79 80, 82, 84-5, 
90, 92, 125, 132, 203, 204, 205, 446, 448, 451, 
458, 461, 467; 499-500, 504, 506, 538; 5525 
564, 565; $69; 576; 5773 5785 580, 581, 582; 584; 
586, 587 

Ruppenhofer, Josef 435 

Russell, Bertrand 103,108, 115 

Russisches etymologisches 
Worterbuch 346, 347-8 

Rutherford, Emily 101 

Rychly, Pavel 83, 89 


Sagart, Laurent 341 
Sailer, Manfred 417 
Sainte-Palaye, Jean-Baptiste de 
Lacurnede 166 
sales of dictionaries 22, 33 
Salesbury, William 10, 608 
salience 434,444 
Sampson, Geoffrey 455 
Samuels, Michael 380, 615 
SAMUELS tagging system 379 
Sanchez, Aquilino 407 
Saussure, Ferdinand de 103, 339, 428; 442: 525 
Séerba, Lev 145-6, 147, 151-2 
Schaeder, Burkhard 320,396 
Scherer, Carmen 322 
Schermer,G.M. 4 
Scheuringer, Hermann 597, 603 
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Schierholz, Stefan J. 396 

Schmid, Hans-Jérg 394 

Schmid, Helmut 71 

Schneider, J. G. 168, 611 

Scholze-Stubenrecht, Werner 520 

Schone, Patrick 412 

school dictionaries: see children’s dictionaries 

Schoonheim, Tanneke 173 

Schrader, Norbert 167 

Schrodt, Richard 597, 602 

Schuessler, Axel 341 

Schiitze, Hinrich 505, 508 

Schweickard, Wolfgang 171, 338, 341, 347 

scientific vocabulary 140, 171-2, 393-407, 435; 
see also technical vocabulary 

Scots Thesaurus 372, 376 

The Scotsman 205 

Scott, Mike 78 

Scott, Robert 169, 170, 357, 611 

Scottish Christian Names 272 

Scottish Surnames 279, 281 

Scrabble 7, 16 

Scragg, D. G. 16 

search engine optimization 588 

Seldon, Anthony 552-3 

semantic change 346 

semantic web 454 

semi-bilingual dictionaries: see bilingualized 
dictionaries 

sense (as opposed to reference) 102 

Shakespeare, William 248-9, 371, 555, 557 

Sheidlower, Jesse 200, 201 

Sheridan, Thomas 472-3, 475, 477, 480 

Short, Charles 355, 611 

Shvaiko, Pavel 513 

sign language lexicography 4,344 

‘signposts’ 28,39 

Sijens, Hindrik 173 

Silva, Penny 169,180,184 

Simple English Wikipedia 143 

Simple English Wiktionary 142-3 

Simpson, John 184, 199, 301, 543, 615 

Sinclair, John 27, 28, 34, 79, 83, 103, 116—19, 130, 
131, 133, 203, 412, 432, 557 

single-author dictionaries (i.e. of the works of 


asingle author) 172, 359 


Sino-Tibetan Etymological Dictionary and 
Thesaurus 341 

Sitta, Horst 322 

Skeat, Walter William 257-8 

Sketch Engine 55, 78, 82, 84-8, 91, 423, 565 

Skinner, Stephen 165, 609 

slang 18, 66,139, 146, 158, 194-5, 212, 241, 
303, 385, 386, 392, 394, 488, 492, 494, 498, 
499, 500, 538, 540, 553, 602; see also slang 
dictionaries, colloquial language, standard 
and non-standard lexis 

slang dictionaries 66, 172, 289, 290, 325-37, 
464, 469-71, 540 

Sledd, James H. 13 

Sloane, Sir Hans 11 

Slobin, Dan I. 151 

Slovar’ Akademii Rossiiskoi 166 

Slovene WordNet 92 

Smith, A. H. 261, 266, 275 

Smith, Raoul N. 446 

Smith-Bannister, Scott 275 

Snell, George 550-1, 553, 559 

social variation 384-5, 387 

sociolinguistics 98, 269, 325, 329, 352) 3575 
384-5, 386, 551-2, 591, 594, 596, 597 

Soehn, Jan-Philipp 421 

Solander, Daniel 269 

sound files 292, 307~8 

sound symbolism 375 

source language 144 

Souter, Alexander 355 

specialist consultants 282, 337, 399-400 

species (in taxonomy) 110-12 

spelling and orthography 13, 24, 292, 310-24, 
328, 359; 361, 387, 388-9, 473-4, 512 

spelling dictionaries 310-24 

Spence, Thomas 477, 482 

Spevack, Marvin 371 

Spittal, Jeffrey 258 

splitting 46, 184, 330, 370, 381, 457 

splits (historical, of a single lexical item into 
two) 247-51 

Spohr, Dennis 511 

Spoken Early English Dialect (SPEED) 
project 383 

Stadtler, Thomas 348 
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Stammwortprinzip 166 

standard and non-standard lexis, standard 
varieties, standardization 12,13, 17, 23, 
24, 37, 48, 171, 184, 190, 193, 194, 198, 
229, 247, 248, 258, 265, 281, 301-3, 311, 
312, 318, 325, 356, 358, 361, 382, 384-92, 
464, 472, 484, 485-6, 490, 492, 494, 499, 
546; 548-52, 590-603; see also slang, 
regional usage 

Stark, M.P. 35 

Stathi, Katerina 420 

Statistics 81-2, 502, 504-6 

Stefanowitsch, Anatol 437 

Stein, Gabriele 518 

Stenton, Sir Frank M. 276 

Stern, Gustaf 426 

Stevenson, Patrick 594 

Stevenson, Suzanne 412 

Storjohann, Petra 529 

Strathy Corpus of Canadian English 210, 
213, 214 

stratification (in corpus construction) 64~5 

Stray, Christopher 355, 356, 363 

Struan, Andrew 375 

structuralism 21, 428-30 

Stubbs, Michael 412, 432 

Sturiale, Massimo 293 

style guides (for dictionary editing) 564,569, 
574, 579 577, 578 

substitutability (in definitions) 94-8, 102, 128, 
129, 184-5, 448 

Summers, Della 17 

support verb constructions 414-24 

Surnames of Ireland 278, 283,287 

Survey of English Dialects 195, 388-9 

Survey of English Place-Names 257, 260-1, 
266, 270 

Survey of English Usage 27 

Svensén, Bo 32, 33, 125, 147, 154, 186, 203, 205, 
206, 488, 498, 499, 516, 519, 523, 524, 591 

Svenska Akademiens Ordbok 171, 613 

Swanepoel, Piet 435, 464 

Swift, Jonathan 372 

Swiggers, Pierre 348 

symbols, use of 16, 48,197-8, 399; 400, 402, 
488, 489, 491, 492, 495, 497, 527; see also 
pronunciation 


synonyms, synonymy 11, 21, 24, 28, 86, 87, 
95-8, 102, 109, 122, 124, 127, 128, 133, 135-6, 
150, 157, 158, 181, 184-5, 370, 372-3: 374, 3775 
378, 380, 381, 402, 415, 429, 430, 431, 433, 436; 
439-56, 510, 511, 544, 554, 582 

syntagmatic relations 21 

synthesis (in the compilation of corpus-based 
bilingual dictionaries) 56 

Szczepaniak, Renata 516 

Szpakowicz, Stan 455 


taboo words 18, 37, 48, 139, 186, 497, 538; 545, 
$52, 558, 600 

Takacs, Gabor 343 

Tan, Kim Hua 39, 128 

target language 144 

Tarp, Sven 394, 395, 396, 403, 404, 405, 406, 
462, 464, 470, 513 

taxonomies 69, 70, 107, 108, 371, 372, 374, 
377, 397 402, 443-4, 445, 450, 451; $02, 504, 
§11, 512 

Taylor, John. R. 433,449 

technical vocabulary 140, 391-2; 393-407, 
435; 443 

teleological perspective (in historical 
lexicography) 244, 246, 248-9 

Temmerman, Rita 406, 435 

Text Creation Partnership 205 

Text Encoding Initiative (TEI) 71-2 

text type 62, 65, 66, 67, 68, 77, 204, 358, 499 

thematic dictionaries 23 

Thesaurus linguae Latinae 169, 353, 355; 356, 
360, 362, 364, 366, 380, 613 

A Thesaurus of Old English 371, 372, 380 

thesauruses 21, 41, 52, 60, 108, 109, 136, 278, 
367-80, 440, 445, 446, 449-50, 451, 453-6, 
510, 582 

Thim, Stefan 347 

Thinkmap Visual Thesaurus 446, 455-6 

Thomas, Alan R. 194 

Thuresson, Bertil 278 

Thurmair, Gregor 505 

TickBox Lexicography 91 

The Times of India 205 

The Times (London) 205, 217,559 

Titford, John 279, 282, 285, 288 

Tjong Kim Sang, Erik 456 


TLex Dictionary Production System 91 

Tognini-Bonelli, Elena 78-9 

tokenization (in corpora) 71,77 

tone (in Chinese) 305 

Tooth, Edgar 282, 288 

topic domains: see domain 

training (of lexicographers) 332, 561-78 

transfer (in the compilation of corpus-based 
bilingual dictionaries) 55 

transferred uses 186-8, 362, 364 

transitivity 129, 182, 231-3, 460 

translation 51-3, 144-6, 148-50, 153, 363, 
577-8; see also encoding, decoding 

translation equivalence 149-50 

Trap-Jensen, Lars 506, 582 

TreeTagger 71 

Trench, Richard Chevenix 169, 552,555, 556, 
558, 612 

Trésor de la langue frangaise 174, 338-9, 
345, 615 

Trier, Jost 428, 442 

troponyms 444, 454 

Trotter, David 190 

Trudgill, Peter 384, 551-2, 596 

T-score 83 

Tucker, D.K. 279 

Tuggy, David 458, 462 

Tuteja, A. 586 


UCREL Semantic Annotation System 
(USAS) 379 

ukWaC 63, 80, 83 

Ullmann, Stephen 371 

unambiguous use (in grammatical 
analysis) 222 

Ungerer, Friedrich 394 

Upton, Clive 296, 386, 387, 389, 390, 484 

Urban Dictionary 142, 335, 336, 533 

usage 19, 20, 25,37, 45, 56, 76, 129, 139, 157, 159, 
335, 357: 399, 488-500 


van Assem, Mark 511 

van der Kloot, Willem A. 36 

van der Meer, Geart 435 

van Langendonck, Willy 339 

van Niekerk, A.E. 175 

van Sterkenburg, Piet 44,445, 449, 453 
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Varantola, K. 564 

Varvaro, Alberto 341 

Vasmer, Max 346 

Verkuyl, Henk 185 

Verlinde, Serge 513 

Véronis, Jean 502, 507, 508, 513 

Victor, Terry 333, 334 

video 40, 43, 515, 521, 530 

Viétor, Wilhelm 293 

virtual corpora 66-8 

Virtual Language Observatory (VLO) 63 

Vocabolario degli Accademici della Crusca 166, 
550, 609 

Volanschi, A. 585 

Voltaire 166-7 

von Wartburg, Walther 340, 345, 346, 371, 379, 
430, 613 

Vroegmiddelnederlands woordenboek 171, 173 


Waag; Albert 426 

Wachter, J.G. 165 

Wakelin, Martyn FE. 384-5, 389 

Walker, Crayton 37 

Walker, J. 293, 296, 299, 302, 477, 480 

Walters, John 177 

Wandschneider, Marc 587 

The Washington Post 205 

Waugh, Doreen 260 

Webster, Noah 13, 15, 379, 610, 611, 612, 613 

Webster's College Dictionary 485-6 

Webster’s Third New International 
Dictionary 121, 464-6, 478, 485-6, 494, 
496, 556, 615 

Wegner, Immo 529 

Weiner, Edmund 543, 615 

Weinreich, Uriel 19, 382 

Weisgerber, Leo 428 

Weka toolkit 70 

Wells, J. C. 294, 300, 304, 390, 477 

Welsh Surnames 278, 281, 284 

Wentworth, Harold 332 

Werner, Reinhold 146, 404, 405 

West, Michael 26, 27,125, 614 

Wexler, P. J. 167 

Whitakers’ Almanack 15 

Whitcut, Janet 37 

Whitney, William Dwight 13, 613 
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Whitt, J. Kenneth 468 

Widdowson, J. D. A. 386, 387 

Wiegand, Herbert Ernst 394, 395, 525 

Wierzbicka, Anna 97, 105-7, 415, 430 

Wiesinger, Peter 593,595,597, 598, 601, 602, 603 

Wiggins, David 102 

Wikipedia 511, 528; see also Simple English 
Wikipedia 

Wiktionary 142, 336, $33, 583; see also Simple 
English Wiktionary 

Wild, Kate 375 

Wilhite, Steve 301 

Wilkes, G. A. 327 

Wilkins, John 12, 94,101, 108-9, 114, 115, 372, 
377,609 

Wilkinson, P.R. 370 

Wilks, Yorick 502, 508,509 

Williams, Edwin 421 

Willners, Caroline 453 

Wilson, Andrew 412 

Wilson, R. M. 278, 279, 280, 282, 283, 285 

Windsor Lewis, Jack 295, 300 

Witten, Ian H. 70 

Wittgenstein, Ludwig 103, 113, 115 

Wodak, Ruth 595,599 

Wojciechowska, Sylwia 435 

Wood, Mary 419 

Woordeboek van die Afrikaanse taal 175 

Woordenboek der Friese Taal / Wurdboek fan 
de Fryske taal 173, 615 

Woordenboek der Nederlandsche Taal 
(WNT) 170, 171,173) 427, 612 

Worcester, Joseph Emerson 293, 475, 485, 611 

word class: see part of speech 

word games 7, 16, 17 


wordlist, selection of 18-19, 345, 565-7; see 
also inclusion 

WordNet 108, 109, 368, 431, 435, 446, 449, 
454-5, 502; 509-12, 582 

Wordnik 512, 582, 588 

Word Routes bilingual thesauruses 60 

word sense disambiguation 433, 502, 
506, 507-9 

word sense induction 508-9 

WordSmith Tools 78, 82 

Worm, Ole 11 

Wright, Elizabeth Mary 171 

Wright, Joseph 195, 382, 383, 385, 390, 613 

Wright, Sue 592 

Wiister, Eugen 435 

Wyld, Henry C.K. 478, 483-4 


XML 72, 91, 503, 568, 570; 572, 573, 577 
Xu, Hai 35 


Yamada, Shigeru 29 
Yang, Wen Xiu 37 
Yarowsky, David 508 
Yong, Heming 403, 404 


Zaenen, Annie 507, 510, 513 

Zalizniak, Anna A. 346 

zero derivation: see conversion 

Zgusta, Ladislav 146, 160, 166, 168, 169, 180, 
394, 395, 405, 430, 448, 458, 464,547 

Zhao, Y. 69 

Zock, Michael 510 

Zolli, Paolo 346 

Zurich English Newspaper Corpus 205 

Zwitserlood, Inge 4 
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